Publication | Closed Access
Maximal consistent interpretations of errorful data in hierarchically modelled domains
18
Citations
1
References
1977
Year
Maximal Consistent InterpretationsWord SequencesEngineeringNeurolinguisticsSemantic ProcessingL Language UnderstandingPsycholinguisticsUncertain DatabaseUncertain DataUncertainty FormalismCorpus LinguisticsNatural Language ProcessingData ScienceUncertainty QuantificationComputational LinguisticsGrammarLanguage StudiesStatisticsLanguage TechnologyTen TComputer ScienceError AnalysisAutomated ReasoningLanguage RecognitionStatistical InferenceText ProcessingDomain ModelLinguisticsModel AnalysisData Modeling
A method is presented fo r c o n s t r u c t i n g maximal cons is ten t t n t e p r e t a t i o n s e r r o r ! u l d a t a . The method appears a p p l i c a b l e to many tasks (speech unders tand ing , n a t u r a l language understanding;, v i s i o n , medical d iagnos is ) r e q u i r i n g p a r t i a l m a t c h i n g o f e r r o r f u l data aga inst complex, h i e r a r c h i c a l l y de f ined p a t t e r n s . The data is represented as symbolic s t r u c t u r e s (word sequences, l i n e segment c o n f i g u r a t i o n s , disease symptoms). Errors consist missing data (unrecognized words, occluded l i n e s , undetected symptoms) and e x t r a (poss ib ly incons is ten t ) data ( i n c o r r e c t l y recognized words, v i s u a l n o i s e , spur ious symptoms). Data i n t e r p r e t a t i o n s correspond to subst ruc tures a h ie rarchy concepts . Cons t ra in ts on cons is ten t predef ined conceptual h i e r a r c h y . c o r r e c t l y fragments speech s t r u c t u r e s embedded t h e An imp1erne nta t ion the me t hod has sets sentence the HEARSAY-II system. The i n t e r p r e t e d e r r o r f u l recognized by understanding Implementat ion has a lso c o r r e c t l y i n t e r p r e t e d t y p e d i n ungrammatical sentences. D e t a i l e d examples i l l u s t r a t e o p e r a t i o n the method on rea l d a t a . 0DUCT10N The a p p l i c a t i o n Al methods to complex domains ( e . g . , spe ec h , v i s ion , medical d i agn os is ) has expanded the dimensions data i n t e r p r e t a t i o n to incorpora te some novel phenomena. Two these phenomena are data e r r o r and h i e r a r c h i c a l l y de f ined data p a t t e r n s . Many complex domains are c h a r a c t e r i z e d by e r r o r f u l d a t a . E r rors such as i n s e r t i o n , d e l e t i o n , s u b s t i t u t i o n , in f orina t ion incrcase as source data t r a n s d u c t i o n i n cr eases . D a t a ma y be in that two or more piece: be e x p l a i n e d c o n s i s t e n t l y , inconsis t enc i es in t he and r e p e t i t i o n the u n c e r t a i n t y and i n t e r p r e t a t i o n mut vial ly i nco n s i s t en t s i n fo rmat ion cannot T o l e r a t i n g e r r o r and data r e q u i r e s robust methods that can not only f i n d the best i n t e r p r e t a t i o n but are able to d i s t i n g u i s h the incons is ten t and e r r o r f u l data from the cons is ten t d a t a . Another aspect data i n t e r p r e t a t i o n in complex domains is that i n t e r p r e t a t i o n s represent complex, h i e r a r c h i c a l l y de f ined concepts ( i d e a s , r u l e s , p a t t e r n s ) r a t h e r than s i m p l e , independent concepts ( f e a t u r e s ) . Of ten the concepts used in i n t e r p r e t a t i o n s can be placed in a h ierarchy where each concept is de f ined in terms of i t s subconcepts. Th is s t r u c t u r e concepts is c a l l e d a conceptual h i e r a r c h y . A c o l l e c t i o n oi data can then be i n t e r p r e t e d by the highest concept in the h ie ra rchy supported ( v a l i d a t e d ) by the d a t a . The i n t e r p r e t a t i o n the data is de f ined by the concept 's descendants (subconcepts, subsubconcepts, e t c . ) and the data which supports them. These descendants form a s u b s t r u c t u r e the conceptual h i e r a r c h y . The general data i n t e r p r e t a t i o n problem can now be r e s t a t e d as a search f o r the concept in the conceptual h ie ra rchy that e x p l a i n s ( i s supported by) the most d a t a . The data suppor t ing the s t r u c t u r e under ly ing t h i s maximal concept can be descr ibed as the maximal c o n s i s t e n t subset d a t a . In t h i s paper we de f ine conceptual This work was supported in p a r t by the Defense Advanced Research P r o j e c t s Agency under c o n t r a c t no. F 4 4 6 2 0 7 3 O 0 0 7 4 and monitored by the A i r Force O f f i c e S c i e n t i f i c Research. In a d d i t i o n , the f i r s t author was p a r t i a l l y supported by a N a t i o n a l Research Counci l Canada Postgraduate Scholarship and the second author was p a r t i a l l y supported by a N a t i o n a l Science Foundation Graduate F e l l o w s h i p . h i e r a r c h i e s and maximal cons is ten t i n t e r p r e t a t i o n s . We then descr ibe a method f o r i n t e r p r e t i n g data in such an environment, i . e . , f i n d i n g maximal cons is ten t i n t e r p r e t a t i o n s in a conceptual h i e r a r c h y . Examples i l l u s t r a t i n g the method are shown. F i n a l l y , we show the ac tua l a p p l i c a t i o n the method to the problem i n t e r p r e t i n g e r r o r f u l sentence fragments recognized by the HEARSAY-II speech understanding system (Erman, 19 7 5 ) . 2. A REAL EXAMPLE The ma tch ing problem used as throughout t h i s paper is taken from speech understanding system. When unable to complete ly recogniz sentence ( u t t e r a n c e ) , i t generat'* sentence fragments (Hayes-Rotn et ai , must be i n t e r p r e t e d by the semanticmodule, named SGI ANT. The generat can be both e r r o r f u l a i n c o n s i s t e n t (Example 2 . 1 ) . A senten a chunk cons is ten t data in that it grammat ica l ly p l a u s i b l e sequence words. HEARSAY-II mechanisms i d e n t i f y i n g such chunks are not s u i t e them i n t o an o v e r a l l cons is tent i n t tlie u t t e r a n c e . EXAMPLE 2. 1 an example the HEARSAY-II HEARSAY-II is e a spoken s a set 19 76c) which i n t e r p r e t a t i o n ed fragments nd mutual ly ce fragment is consists a recognized e f f e c t i v e i n d to combining e r p r e t a t i o n Fragment p o r t i o n the 1-3 conta in 1 and 2 are they provide the over lapp ing 1 6. 3, 1 & 1: [ WHAT HAS HERBERT 2: PAPER ABOUT PATTERN MATCHING ] 3: IN LEARNING OR PATTERN MATCHING J 4: [ WHO Correct Sentence: [ WHO HAS WRITTEN ABOUT PATTERN MATCHING ] Example 2 . 1 shows four sentence fragments generated when HEARSAY-II was unable to recognize the sentence [ WHO HAS WRITTEN ABOUT PATTERN MATCHING ] . The square brackets denote the s t a r t and f i n i s h the spoken u t t e r a n c e . The numbers enclosed in angle brackets s p e c i f y , in cent iseconds, how long a f t e r the s t a r t the u t te rance each fragment begins and ends. 4 c o r r e c t l y matches the i n i t i a l spoken sentence. Fragments s u b s t i t u t i o n e r r o r s . Fragments mutual ly i n c o n s i s t e n t in that d i f f e r e n t i n t e r p r e t a t i o n s o f the time per iod . The fragment p a i r s 4. and 2 & 3 are i n c o n s i s t e n t for the same reason. A lso , Fragment I s p e c i f i c s a WHAT quest ion whereas fragment 4 s p e c i f i e s a WHO q u e s t i o n . Thus Fragments 1 and 4 are semant l e a l ly i n c o n s i s t e n t , I r r e g a r d less t h e i r t imes . Each fragment is semant i c a l l y descr ibed by a h i e r a r c h i c a l l y s t r u c t u r e d c o l l e c t i o n concepts. F igure 2 .1 shows a p o r t i o n the conceptual h ie rarchy used by the SEMANT module in HEARSAY-II . F igure 2.2 shows the h i e r a r c h i c a l d e s c r i p t i o n the cor rec t sentence. The problem i n t e r p r e t i n g these fragments i l l u s t r a t e s the phenomena data e r r o r and h i e r a r c h i c a l l y s t r u c t u r e d i n t e r p r e t a t i o n s . The method used f o r s o l v i n g t h i s problem appears a p p l i c a b l e to a s i g n i f i c a n t c lass problems e x h i b i t i n g these two phenomena. 3. CONCEPTUAL HT FERARC.HIES A conceptual h ie ra rchy can be represented by a d i r e c t e d graph concepts. Th is graph is trees t r u c t u r e d In tha t i t has a root at the to leaf nodes at the bottom; however p e r m i t t e d . The sons a node subconcepts that compose the f a t h e r , the graph de f ines the h ighest g e n e r a l ) i n t e r p r e t a t i o n o f a l l beneath i t . A g iven i n t e r p r e t a t i o n task has top and cycles are de f ine the The root l e v e l (most the concepts
| Year | Citations | |
|---|---|---|
Page 1
Page 1