[lkb] Parsing rich data with LKB

Discussion:

Katya Alahverdzhieva

2010-03-17 13:48:24 UTC

Dear LKB people,

How would you go about using LKB to parse data that is richer than just
text, and also to define temporal constraints? How do I parse data which
comes not as a stream of tokens, but as a list of feature structures?

I have a corpus of transcriptions of spoken text, annotated with gesture
and prosody information, including the time of their performance. I'm
trying to write a grammar in LKB whose rules take into account the
timestamps, the pitch accents and the gestures represented as sets of
feature-values.

For instance, I need to somehow capture in my grammar rules the notion
of temporal overlap, i.e., whether a gesture is happening at the same
time as a word/sequence of words. Also, I am trying to parse richer data
where words are not just tokens, but whole feature structures
(containing prosody, timestamps and gesture description).

In practical terms, what would everyone's recommended approach be to
parsing structured data like this and to comparing temporal
performances? Are there plugins or such software for this? Does anyone
know of any examples that I could look at to examine how it's done?

Thanks in advance for any hints!

Cheers
Katya

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Glenn Slayden

2010-03-20 19:28:24 UTC

Permalink

I'm certainly no expert on the LKB, but only two approaches come to my mind.
Both are hacky.

1. you could attach your metadata to lemmas as coded suffixes, and then use
the morphological features of the LKB to map these into the appropriate
feature structures. This approach would allow you to use a static grammar to
parse unseen sentences, as you probably want.

2. you could programmatically generate TDL with the feature structures for a
particular input, and then load that as part of a "grammar" which is a
grammar that is specific for parsing that one input.

Best,

Glenn

-----Original Message-----
From: lkb-***@emmtee.net [mailto:lkb-***@emmtee.net] On Behalf Of
Katya Alahverdzhieva
Sent: Wednesday, March 17, 2010 6:48 AM
To: ***@delph-in.net
Subject: [lkb] Parsing rich data with LKB

Dear LKB people,

How would you go about using LKB to parse data that is richer than just
text, and also to define temporal constraints? How do I parse data which
comes not as a stream of tokens, but as a list of feature structures?

I have a corpus of transcriptions of spoken text, annotated with gesture
and prosody information, including the time of their performance. I'm
trying to write a grammar in LKB whose rules take into account the
timestamps, the pitch accents and the gestures represented as sets of
feature-values.

For instance, I need to somehow capture in my grammar rules the notion
of temporal overlap, i.e., whether a gesture is happening at the same
time as a word/sequence of words. Also, I am trying to parse richer data
where words are not just tokens, but whole feature structures
(containing prosody, timestamps and gesture description).

In practical terms, what would everyone's recommended approach be to
parsing structured data like this and to comparing temporal
performances? Are there plugins or such software for this? Does anyone
know of any examples that I could look at to examine how it's done?

Thanks in advance for any hints!

Cheers
Katya

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Berthold Crysmann

2010-03-21 17:20:56 UTC

Permalink

I wouldn't go down that road.
I would use the LKB with a PET backend for parsing. PET now has
a "chartmapping" lattice-rewriting frontend which supports arbitrary
features to be injected into lexically instantiated feature structures
via an XML format.
You'd have to do a fair amount of programming to get that to work.
I believe Peter Adolphs is the person you want to talk to about that.
I'm forwarding this to the developers list, on the off chance that
the relevant people may not be reading the LKB list.
Richard

To get an idea how it works, there's the following documentation:

1. Our LREC 2008 paper (Adolphs et al.)
2. Peter's slides from the chartmapping tutorial on the Barcelona Wiki.
3. Grammars using chartmapping (ERG, GG, HaG, any others?)

There also used to be some SAF, SMAF etc. input for LKB developed by Ben
Waldron. I don't know though how well that is maintained at present.

Hope this helps.

Berthold

Post by Glenn Slayden
I'm certainly no expert on the LKB, but only two approaches come to my
mind. Both are hacky.
1. you could attach your metadata to lemmas as coded suffixes, and then use
the morphological features of the LKB to map these into the appropriate
feature structures. This approach would allow you to use a static grammar
to parse unseen sentences, as you probably want.
2. you could programmatically generate TDL with the feature structures for
a particular input, and then load that as part of a "grammar" which is a
grammar that is specific for parsing that one input.
Best,
Glenn
-----Original Message-----
Katya Alahverdzhieva
Sent: Wednesday, March 17, 2010 6:48 AM
Subject: [lkb] Parsing rich data with LKB
Dear LKB people,
How would you go about using LKB to parse data that is richer than just
text, and also to define temporal constraints? How do I parse data which
comes not as a stream of tokens, but as a list of feature structures?
I have a corpus of transcriptions of spoken text, annotated with gesture
and prosody information, including the time of their performance. I'm
trying to write a grammar in LKB whose rules take into account the
timestamps, the pitch accents and the gestures represented as sets of
feature-values.
For instance, I need to somehow capture in my grammar rules the notion
of temporal overlap, i.e., whether a gesture is happening at the same
time as a word/sequence of words. Also, I am trying to parse richer data
where words are not just tokens, but whole feature structures
(containing prosody, timestamps and gesture description).
In practical terms, what would everyone's recommended approach be to
parsing structured data like this and to comparing temporal
performances? Are there plugins or such software for this? Does anyone
know of any examples that I could look at to examine how it's done?
Thanks in advance for any hints!
Cheers
Katya