Stewart Yang - San Jose CA, US Fang Liu - Beijing, CN Pei Cao - Palo Alto CA, US
Assignee:
Google Inc. - Mountain View CA
International Classification:
G06F 17/30
US Classification:
707736, 707749
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying non-compositional compounds. In one aspect, a method includes the actions of receiving a collection of phrases, each phrase including two or more words; for each phrase, determining if the phrase is a non-compositional compound, a non-compositional compound being a phrase of two or more words where the words composing the phrase have different meanings in a compound than their conventional meanings individual, the determining including: identifying a similar term for a term of the phrase, substituting the similar term for the term of the phrase to generate a substitute phrase, calculating a similarity between the phrase and the substitute phrase, and identifying the phrase as a non-compositional compound when the calculated similarity is less than a specified threshold value.
Stewart Yang - Sunnyvale CA, US Fang Liu - Beijing, CN Dekang Lin - Cupertino CA, US Hongjun Zhu - Los Altos CA, US
Assignee:
Google Inc. - Mountain View CA
International Classification:
G06F 7/00 G06F 17/30
US Classification:
707767, 707736, 704 10
Abstract:
Aspects directed to phrase generation are provided. A method is provided that includes identifying a plurality of phrase candidates from a plurality of text string entries in a corpus. For each phrase candidate: identifying a plurality of left contexts and a plurality of right contexts for the phrase candidate, each left context of the plurality of left contexts being a nearest unique feature to the right of the phrase candidate in a text string entry and each right context of the plurality of right contexts being the nearest unique feature to the right of the phrase candidate, and calculating a left context vector including a score for each left context feature and a right context vector including a score for each right context feature of the phrase candidate. A similarity is determined between pairs of phrase candidates using the respective left and right context vectors for each phrase candidate.
Xin Liu - San Jose CA, US Stewart Yang - Sunnyvale CA, US
Assignee:
Google Inc. - Mountain View CA
International Classification:
G06F 17/28
US Classification:
704 4, 704 7, 704 8, 704 9, 704 10
Abstract:
Methods, systems and apparatus, including computer program products, for identifying properties of an electronic document. In one aspect, a sequence of bytes representing text in a document is received. A plurality of byte-n-grams are identified from the bytes. For multiple encodings, a respective likelihood of each byte-n-gram occurring in each of the respective multiple encodings is identified. A respective encoding score for each of the multiple encodings is determined. A most likely encoding of the document is identified based on a highest encoding score among the encoding scores. In another aspect, a sequence of characters, having an encoding, are identified in a document. The sequence is segmented into features, each corresponding to two or more characters. A respective score for each of multiple languages is determined based on the features and a respective language model. A language of the document is identified based on the scores.
Generating Phrase Candidates From Text String Entries
Stewart Yang - Sunnyvale CA, US Fang Liu - China, CN Dekang Lin - Cupertino CA, US Hongjun Zhu - Los Altos CA, US
Assignee:
Google Inc. - Mountain VIew CA
International Classification:
G06F 17/30
US Classification:
707767, 707736, 704 10
Abstract:
Aspects directed to phrase generation are provided. A method is provided that includes identifying a plurality of phrase candidates from a plurality of text string entries in a corpus. For each phrase candidate: identifying a plurality of left contexts and a plurality of right contexts for the phrase candidate, each left context of the plurality of left contexts being a nearest unique feature to the right of the phrase candidate in a text string entry and each right context of the plurality of right contexts being the nearest unique feature to the right of the phrase candidate, and calculating a left context vector including a score for each left context feature and a right context vector including a score for each right context feature of the phrase candidate. A similarity is determined between pairs of phrase candidates using the respective left and right context vectors for each phrase candidate.
Stewart Yang - San Jose CA, US Fang Liu - Beijing, CN Pei Cao - Palo Alto CA, US
Assignee:
Google Inc. - Mountain View CA
International Classification:
G06F 17/30
US Classification:
707736, 707749
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying non-compositional compounds. In one aspect, a method includes the actions of receiving a collection of phrases, each phrase including two or more words; for each phrase, determining if the phrase is a non-compositional compound, a non-compositional compound being a phrase of two or more words where the words composing the phrase have different meanings in a compound than their conventional meanings individual, the determining including: identifying a similar term for a term of the phrase, substituting the similar term for the term of the phrase to generate a substitute phrase, calculating a similarity between the phrase and the substitute phrase, and identifying the phrase as a non-compositional compound when the calculated similarity is less than a specified threshold value.
Googleplus
Stewart Yang
Education:
University of Warwick
Youtube
18. Ying Yang - Jigglin - Welcome to the Rile...
I dont hold any rights to music or video material. All rights belong t...
Category:
People & Blogs
Uploaded:
23 Sep, 2010
Duration:
4m 26s
Round 1 Highlights: Accenture Match Play Cham...
Mike Weir, Rory McIlroy, YE Yang, Stewart Cink and Ryo Ishikawa are ju...
Category:
Sports
Uploaded:
18 Feb, 2010
Duration:
3m 18s
TWILIGHT BREAKING DAWN- BooBoo Stewart Is Sin...
We got an exclusive interview with BooBoo Stewart and Fivel Stewart! B...
Category:
Entertainment
Uploaded:
24 Nov, 2010
Duration:
1m 36s
KurteeK Feat. Jamz Dean-Ying Yang Cuts
Video featuring my homie Jamz Dean Can you tell its 2 people cutting i...
Category:
Music
Uploaded:
12 Jan, 2011
Duration:
1m 40s
Marcus Johnson - Virtual Insanity (Feat. Bobb...
Once again, Washington, DC-based and nationally and internationally re...
Category:
Music
Uploaded:
02 Oct, 2010
Duration:
5m 25s
Kantor Nyaman Bagaikan Rumah
Kantor Rackspace, bukanlah kantor biasa. Pada tahun 2010, perusahaan i...
Category:
Nonprofits & Activism
Uploaded:
01 Mar, 2011
Duration:
2m 23s
Blagojevich throws Obama under bus, Blagojevi...
DATE: 11/07/2008 TIME: 4:11 PM ACTIVITY: Rod Blagojevich home line inc...
Category:
News & Politics
Uploaded:
26 Jun, 2010
Duration:
1m 1s
SFIAAFF '10: An Afternoon with Aasif Mandvi
This lively and humorous interview was conducted by former Festival Di...