Eric D. Brill - Redmond WA, US Susan T. Dumais - Kirkland WA, US
Assignee:
Microsoft Corporation - Redmond WA
International Classification:
G06F 17/30
US Classification:
707 2, 707 5
Abstract:
Architecture for improving text searches using information redundancy. A search component is coupled with an analysis component to rerank documents returned in a search according to a redundancy values. Each returned document is used to develop a corresponding word probability distribution that is further used to rerank the returned documents according to the associated redundancy values. In another aspect thereof, the query component is coupled with a projection component to project answer redundancy from one document search to another. This includes obtaining the benefit of considerable answer redundancy from a second data source by projecting the success of the search of the second data source against a first data source.
System, Representation, And Method Providing Multilevel Information Retrieval With Clarification Dialog
Susan T. Dumais - Kirkland WA, US Eric J. Horvitz - Kirkland WA, US
Assignee:
Microsoft Corporation - Redmond WA
International Classification:
G06F 17/30
US Classification:
707 3, 707101, 707102, 707103 Y, 7071041
Abstract:
An information retrieval system, including a learning and real-time classification methodology, is provided in accordance with the present invention. The system includes a hierarchal analysis component that receives a query and processes probabilities associated with N categories, each category having one or more topics, wherein N is an integer. An interactive component drives clarification dialog that is derived from the query and the probabilities associated with the N categories and the one or more topics. The clarification dialog, driven by a rule-based policy, a decision-theoretic analysis considering the costs of dialog to focus the results versus the costs of browsing larger lists, or combinations of rules and decision-theoretic analysis is employed when valuable to determine at least one category of the N categories to facilitate retrieval of at least one of the topics.
Probablistic Models And Methods For Combining Multiple Content Classifiers
Susan T. Dumais - Kirkland WA, US Eric J. Horvitz - Kirkland WA, US Paul Nathan Bennett - Pittsburgh PA, US
Assignee:
Microsoft Corporation - Redmond WA
International Classification:
G06N 5/02
US Classification:
706 50, 706 12, 706 14
Abstract:
The invention applies a probabilistic approach to combining evidence regarding the correct classification of items. Training data and machine learning techniques are used to construct probabilistic dependency models that effectively utilize evidence. The evidence includes the outputs of one or more classifiers and optionally one or more reliability indicators. The reliability indicators are, in a broad sense, attributes of the items being classified. These attributes can include characteristics of an item, source of an item, and meta-level outputs of classifiers applied to the item. The resulting models include meta-classifiers, which combine evidence from two or more classifiers, and tuned classifiers, which use reliability indicators to inform the interpretation of classical classifier outputs. The invention also provides systems and methods for identifying new reliability indicators.
Utilizing Information Redundancy To Improve Text Searches
Eric D. Brill - Redmond WA, US Susan T. Dumais - Kirkland WA, US
Assignee:
Microsoft Corporation - Redmond WA
International Classification:
G06F 17/30
US Classification:
707 2, 707 5
Abstract:
Architecture for improving text searches using information redundancy. A search component is coupled with an analysis component to rerank documents returned in a search according to a redundancy values. Each returned document is used to develop a corresponding word probability distribution that is further used to rerank the returned documents according to the associated redundancy values. In another aspect thereof, the query component is coupled with a projection component to project answer redundancy from one document search to another. This includes obtaining the benefit of considerable answer redundancy from a second data source by projecting the success of the search of the second data source against a first data source.
Method And System For Usage Analyzer That Determines User Accessed Sources, Indexes Data Subsets, And Associated Metadata, Processing Implicit Queries Based On Potential Interest To Users
Susan T. Dumais - Kirkland WA, US Eric J. Horvitz - Kirkland WA, US Edward B. Cutrell - Seattle WA, US Jonathan J. Cadiz - Redmond WA, US Gavin Jancke - Sammamish WA, US Raman K. Sarin - Redmond WA, US Daniel C. Robbins - Seattle WA, US Anoop Gupta - Woodinville WA, US George G. Robertson - Seattle WA, US Meredith J. Ringel - Stanford CA, US Jeremy Goecks - Atlanta GA, US
The present invention relates to systems and methods providing content-access-based information retrieval. Information items from a plurality of disparate information sources that have been previously accessed or considered are automatically indexed in a data store, whereby a multifaceted user interface is provided to efficiently retrieve the items in a cognitively relevant manner. Various display output arrangements are possible for the retrieved information items including timeline visualizations and multidimensional grid visualizations. Input options include explicit, implicit, and standing queries for retrieving data along with explicit and implicit tagging of items for ease of recall and retrieval. In one aspect, an automated system is provided that facilitates concurrent searching across a plurality of information sources. A usage analyzer determines user accessed items and a content analyzer stores subsets of data corresponding to the items, wherein at least two of the items are associated with disparate information sources, respectively.
Systems And Methods For Performing Background Queries From Content And Activity
Susan T. Dumais - Kirkland WA, US Eric J. Horvitz - Kirkland WA, US Edward B. Cutrell - Seattle WA, US Raman K. Sarin - Redmond WA, US
Assignee:
Microsoft Corporation - Redmond WA
International Classification:
G06F 17/30
US Classification:
707 5, 707 10
Abstract:
Most information retrieval systems start with a user's explicit query. Systems and methods are provided that perform implicit or background queries to one or more information sources based on the ongoing activities of users. The methods provide users with the results of such automated contextualized searches in an unobtrusive manner. In one aspect, implicit queries are run when users are reading, working on or composing an application. Queries can be automatically generated by analyzing an application, and results can be presented in a variety of peripheral display configurations, including a small pane adjacent to a current window to provide peripheral awareness of related information that is automatically determined from existing user context and/or related content from the application. The invention includes methods for building models that predict the value of different queries, and of the results generated by such queries, based on logged data, and for using such models to control query formulation and to mediate decisions about displaying the results of implicit queries.
Principles And Methods For Personalizing Newsfeeds Via An Analysis Of Information Novelty And Dynamics
Susan T. Dumais - Kirkland WA, US Eric J. Horvitz - Kirkland WA, US Evgeniy Gabrilovich - Herzlia, IL
Assignee:
Microsoft Corporation - Redmond WA
International Classification:
G06F 17/30 G06F 7/00
US Classification:
707 5, 707 6, 707101
Abstract:
A system and methodology is provided for filtering temporal streams of information such as news stories by statistical measures of information novelty. Various techniques can be applied to custom tailor news feeds or other types of information based on information that a user has already reviewed. Methods for analyzing information novelty are provided along with a system that personalizes and filters information for users by identifying the novelty of stories in the context of stories they have already reviewed. The system employs novelty-analysis algorithms that represent articles as a bag of words and named entities. The algorithms analyze inter- and intra-document dynamics by considering how information evolves over time from article to article, as well as within individual articles.
Cost-Benefit Approach To Automatically Composing Answers To Questions By Extracting Information From Large Unstructured Corpora
Eric J. Horvitz - Kirkland WA, US David R. Azari - Seattle WA, US Susan T. Dumais - Kirkland WA, US Eric D. Brill - Redmond WA, US
Assignee:
Microsoft Corporation - Redmond WA
International Classification:
G06F 17/00 G06F 17/30 G06N 5/02
US Classification:
706 46, 706 50, 707 3
Abstract:
The present invention relates to a system and methodology to facilitate extraction of information from a large unstructured corpora such as from the World Wide Web and/or other unstructured sources. Information in the form of answers to questions can be automatically composed from such sources via probabilistic models and cost-benefit analyses to guide resource-intensive information-extraction procedures employed by a knowledge-based question answering system. The analyses can leverage predictions of the ultimate quality of answers generated by the system provided by Bayesian or other statistical models. Such predictions, when coupled with a utility model can provide the system with the ability to make decisions about the number of queries issued to a search engine (or engines), given the cost of queries and the expected value of query results in refining an ultimate answer. Given a preference model, information extraction actions can be taken with the highest expected utility. In this manner, the accuracy of answers to questions can be balanced with the cost of information extraction and analysis to compose the answers.