Publication | Closed Access
Matrices, Vector Spaces, and Information Retrieval
692
Citations
39
References
1999
Year
EngineeringInformation RetrievalData ScienceData MiningVector SpacesIntelligent Information RetrievalVector Space ModelKnowledge DiscoveryRelevance FeedbackVector SpaceText IndexingComputer ScienceData RetrievalDigital LibrariesSearch Engine IndexingOrthogonal FactorizationsSimilarity SearchText Mining
Digital libraries and the Internet have reshaped information processing, storage, and retrieval, driving large‑scale digitization efforts while revealing gaps in indexing and knowledge extraction that recent vector‑space–based retrieval technologies address by modeling data as matrices and queries as vectors, with orthogonal factorizations mitigating database uncertainty. The paper demonstrates how linear algebra concepts can manage and index large text collections.
The evolution of digital libraries and the Internet has dramatically transformed the processing, storage, and retrieval of information. Efforts to digitize text, images, video, and audio now consume a substantial portion of both academic and industrial activity. Even when there is no shortage of textual materials on a particular topic, procedures for indexing or extracting the knowledge or conceptual information contained in them can be lacking. Recently developed information retrieval technologies are based on the concept of a vector space. Data are modeled as a matrix, and a user's query of the database is represented as a vector. Relevant documents in the database are then identified via simple vector operations. Orthogonal factorizations of the matrix provide mechanisms for handling uncertainty in the database itself. The purpose of this paper is to show how such fundamental mathematical concepts from linear algebra can be used to manage and index large text collections.
| Year | Citations | |
|---|---|---|
Page 1
Page 1