当前位置:文档下载 > 所有分类 > IT/计算机 > 计算机软件及应用 > Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers
侵权投诉

Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers

Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers

FindingPatternsinaKnowledgeBaseusingKeywordsto

ComposeTableAnswers

MohanYang1

BolinDing2

1

UniversityofCalifornia,LosAngeles,CA,USA2

MicrosoftResearch,Redmond,WA,USA

SurajitChaudhuri2KaushikChakrabarti2

yang@cs.ucla.edu,

ABSTRACT

{bolind,surajitc,kaushik}@http://www.wendangxiazai.com

Tables[4]andMicrosoft’sExcelPowerQuery[3].Ourworkiscomplementarytothisline,andaimstocomposetablesinresponsetokeywordqueriesfrompatternsinknowledgebaseswhenthede-siredtablesarenotavailableoroflowqualityinthecorpus.

Thereareabundantsourcesofhigh-qualitystructureddata,calledknowledgebases:DBPedia[2],Freebase[6],andYago[9]areex-amplesofknowledgebasescontaininginformationongeneraltop-ics,whiletherearealsospecializedoneslikeIMDB[7]andDBLP[8].Aknowledgebasecontainsinformationaboutindividualenti-tiestogetherwithattributesrepresentingrelationshipsamongthem.Wecanmodelaknowledgebaseasadirectedgraph,calledknowl-edgegraph,withnodesrepresentingentitiesofdifferenttypesandedgesrepresentingrelationships,i.e.,attributes,amongentities.Wecan ndthesubtreesoftheknowledgegraphthatcontainallthekeywordsandreturntheminrankedorder(refertoYuetal.[46]andLiuetal.[32]forcomprehensivesurveys,andSec-tion6fordetaileddiscussion).However,itisnotadequatewhentheuser’squeryistolookforatableofentities.Ashasbeennoticedin[42],thereturnedsubtreeswithaheterogeneousmassofshapesmightcorrespondtodifferentinterpretationsofthequery,andthesubtreescorrespondingtocertaindesiredinterpretationmaynotap-pearcontiguouslyintherankedorder.Iftheuserwantstoexploreallsubtreesofthedesiredinterpretation,shehastoexamineallthereturnedsubtreesandmanuallygatherthosecorrespondingtotheinterpretation.Thisisextremelylaborintensive.Soweproposetoautomaticallyaggregatethesubtreesthatcontainallthekeywordsintodistinctinterpretationsandproducearankedlistofsuchaggre-gations.Structuralpatternofasubtreetogetherwiththemappingfromthekeywordstoitsnodes/edgesrepresentsaninterpretationofthequery,calledtreepattern.Weaggregatethesubtreesbasedontreepatterns.Ourworksharplycontrastsearlierworksonrank-ingsubtrees.Tothebestofourknowledge,thisisthe rstworkon ndingaggregationsofsubtreesongraphsforkeywordqueries.Inthispaper,weproposeandstudytheproblemof ndingrele-vantaggregationsofsubtreesintheknowledgegraphforagivenkeywordquery.Eachanswertothekeywordqueryisasetofsubtrees–eachsubtreecontainingallkeywordsandsatisfyingthesametreepattern.Suchanaggregationofsubtreescanbeoutputasatableofentityjoins,whereeachrowcorrespondstoasubtree.Whentherearemultiplepossibletreepatterns,theyareenumeratedandrankedbytheirrelevancetothequery.

EXAMPLE1.1.(MotivationExample)Figure1(a)-(c)isasmallpieceofaknowledgebasewiththreeentities.Foreachentity(e.g.,“SQLServer”,“Microsoft”,and“BillGates”),weknowitstype(e.g.,Software,Company,andPerson,respectively),andalistofattributes(leftcolumninFigure1(a)-(c))togetherwiththeirval-ues(rightcolumn).Thevalueofanattributemayeitherrefertoanotherentity,e.g.,“Developer”of“SQLServer”is“Microsoft”,

Weaimtoprovidetableanswerstokeywordqueriesusingaknowl-edgebase.Forqueriesreferringtomultipleentities,like“Wash-ingtoncitiespopulation”and“MelGibsonmovies”,itisbettertorepresenteachrelevantanswerasatablewhichaggregatesasetofentitiesorjoinsofentitieswithinthesametableschemeorpattern.Inthispaper,westudyhowto ndhighlyrelevantpatternsinaknowledgebaseforuser-givenkeywordqueriestocomposetableanswers.Aknowledgebaseismodeledasadirectedgraphcalledknowledgegraph,wherenodesrepresentitsentitiesandedgesrep-resenttherelationshipsamongthem.Eachnode/edgeislabeledwithtypeandtext.Apatternisanaggregationofsubtreeswhichcontainallkeywordsinthetextsandhavethesamestructureandtypesonnode/edges.Weproposeef cientalgorithmsto ndpat-ternsthatarerelevanttothequeryforaclassofscoringfunctions.Weshowthehardnessoftheproblemintheory,andproposepath-basedindexesthatareaffordableinmemory.Twoquery-processingalgorithmsareproposed:oneisfastinpracticeforsmallqueries(withsmallnumbersofpatternsasanswers)byutilizingthein-dexes;andtheotheroneisbetterintheory,withrunningtimelinearinthesizesofindexesandanswers,whichcanhandlelargequeriesbetter.Wealsoconductextensiveexperimentalstudytocompareourapproacheswithanaiveadaptionofknowntechniques.

1.INTRODUCTION

Usersoftenlookforinformationaboutsetsofentities,e.g.,intheformoftables[27,41,35].Forexample,ananalystwantsalistofcompaniesthatproducesdatabasesoftwarealongwiththeirannualrevenuesforthepurposeofmarketresearch.Orastudentwantsalistofuniversitiesinaparticularcountyalongwiththeirenrollmentnumbers,tuitionfeesand nancialendowmentinordertochoosewhichuniversitiestoseekadmissionin.

Toprovidesuchservices,someworksleveragethevastcorpusofHTMLtablesavailableontheWeb,tryingtointerpretthem,andreturnrelevantonesinresponsetokeywordqueries[27,41,35,44].Therearealsotwosuchcommercialtablesearchengines:GoogleWorkdonewhilevisitingMicrosoftResearch

ThisworkislicensedundertheCreativeCommonsAttribution-NonCommercial-NoDerivs3.0UnportedLicense.Toviewacopyofthisli-cense,visithttp://www.wendangxiazai.com/licenses/by-nc-nd/3.0/.Obtainper-missionpriortoanyusebeyondthosecoveredbythelicense.Contactcopyrightholderbyemailinginfo@http://www.wendangxiazai.com.Articlesfromthisvolumewereinvitedtopresenttheirresultsatthe40thInternationalConferenceonVeryLargeDataBases,September1st-5th2014,Hangzhou,China.ProceedingsoftheVLDBEndowment,Vol.7,No.14Copyright2014VLDBEndowment2150-8097/14/10.

第1页

免费下载Word文档免费下载:Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers

(下载1-12页,共12页)

猜你喜欢

TOP相关主题

返回顶部