Active Research

Research: Machine Reading and Memory

Goal

I believe that semantic and episodic memory can be (and in all likelihood, have to be) leveraged early during language processing and understanding, in order for a machine reader to accomplish the task of language comprehension, and integration with existing knowledge. This hypothesis is in contrast to the typical Natural Language Processing (NLP) pipeline model which defers memory integration to later stages in processing, and frequently doesn't address issues of scale pertaining to integration with a large existing memory.

Memory and language understanding

Although it is generally accepted that memory and context play a crucial role in language comprehension, the question remains as to when this knowledge should be applied in the task of machine language understanding. Current Neuroscience indicates that humans are accessing deep semantic and episodic knowledge at very early stages when reading. It also shows that humans operate on syntax and semantics at the same time, suggesting that there is no syntactic -> semantic pipeline. (Hagoort 2007) My research explores giving machine readers the same functionality: the ability to access memory at all stages of language understanding.

Direct Memory Access Parsing (DMAP)

Direct Memory Access Parsing (DMAP) (Martin 1992) is a memory-driven, expectation-based, deep-semantic approach to natural language understanding. DMAP uses phrasal patterns linked directly to knowledge structures in memory to recursively recognize textual references and map them to existing knowledge, or to construct new knowledge with similar structure when appropriate.

To facilitate my research I have built an implementation of DMAP on top of the ReseachCyc knowledge base contents, driven by the Fire reasoning engine. In addition to my research questions, several new problems have arisen. For example, building an implementation of DMAP that can work with predicate logic assertions (as exist in Cyc), as opposed to frames for which DMAP was originally designed. Also issues of scale, ResearchCyc is three orders of magnitude larger than any other memory previously used with a DMAP system. Memory based methods for dealing with these problems of scale has also become a research question.

DMAP is about bringing semantic and episodic memory to bear early and efficiently in the language understanding process. Semantic and episodic memory can be powerful resources for many Natural Language Processing (NLP) problems, such as coreference resolution, meaning formulation, and knowledge integration. Furthermore, given the end goal of integrating knowledge from reading with existing knowledge, I contend this process is made easier by operating with semantic and episodic structures, instead of lexical or linguistic structures, as early as possible in the parsing and understanding process.

Application: Learning Reader

DMAP is currently being used in the Learning Reader project, which is jointly advised by: Chris Riesbeck, Ken Forbus, and Larry Birnbaum. We are trying to build a system that actually learns by reading - generating real knowledge, from real text. Some more extensive text about the Learning Reader is located on the QRG (Ken's research group) website. DMAP's role in the Learning Reader project is to support the automatic acquisition of new knowledge from simplified English texts. The Learning Reader knowledge base, extracted from Research Cyc, is two orders of magnitude larger than the frame-based memories previously used by DMAP systems, with approximately 3 million predicate logic assertions and over 28,000 phrasal patterns.

. . . stay tuned for the dissertation

My publications page contains several papers with more detail.

Please contact me, I'd be happy to discuss this or related work with you, or potential employment opportunities (as I intend to be in the market very soon).

Some Previous Research

Information Retrieval (IR)

On Demand Querying Integrated with Television Viewing

I developed a system to on-demand build websites relating to what a person was currently viewing on their television, and present them in real time. For example, if you were watching the news on your television and clicked the "more information" button on your remote control, the software would figure out what you were watching, pick out the specific story and send a custom built micro-website immediately to your laptop sitting next to you. The site would be full of links to related stories as well as in depth information about the key people, business, locations, etc. in the story. This work was published under the name "Beyond Broadcast" and the prototype was at times called "Cronkite". For more information please see my publications page.

Problem Diagnosis and Answering

I also built a system called Pinpoint which combines decisions trees and statistical analysis of free text to drive search. More specifically I build a tool which operates on a domain of Industrial Engineering called "Factory Physics" which details general problems and generalized solutions to those problems. This was transformed in to a kind of best practice decision tree, which users could annotate with free text descriptions of their specific problem. At any time users could trigger automated searches of Internet and intranet knowledge repositories for more information. The goal was to drive search using descriptions of general problems, general solutions, and problem specific information to retrieve documents with problem specific solutions. For more information please see my publications page.