Stewart Yang - San Jose CA, US Fang Liu - Beijing, CN Pei Cao - Palo Alto CA, US
Assignee:
Google Inc. - Mountain View CA
International Classification:
G06F 17/30
US Classification:
707736, 707749
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying non-compositional compounds. In one aspect, a method includes the actions of receiving a collection of phrases, each phrase including two or more words; for each phrase, determining if the phrase is a non-compositional compound, a non-compositional compound being a phrase of two or more words where the words composing the phrase have different meanings in a compound than their conventional meanings individual, the determining including: identifying a similar term for a term of the phrase, substituting the similar term for the term of the phrase to generate a substitute phrase, calculating a similarity between the phrase and the substitute phrase, and identifying the phrase as a non-compositional compound when the calculated similarity is less than a specified threshold value.
Stewart Yang - Sunnyvale CA, US Fang Liu - Beijing, CN Dekang Lin - Cupertino CA, US Hongjun Zhu - Los Altos CA, US
Assignee:
Google Inc. - Mountain View CA
International Classification:
G06F 7/00 G06F 17/30
US Classification:
707767, 707736, 704 10
Abstract:
Aspects directed to phrase generation are provided. A method is provided that includes identifying a plurality of phrase candidates from a plurality of text string entries in a corpus. For each phrase candidate: identifying a plurality of left contexts and a plurality of right contexts for the phrase candidate, each left context of the plurality of left contexts being a nearest unique feature to the right of the phrase candidate in a text string entry and each right context of the plurality of right contexts being the nearest unique feature to the right of the phrase candidate, and calculating a left context vector including a score for each left context feature and a right context vector including a score for each right context feature of the phrase candidate. A similarity is determined between pairs of phrase candidates using the respective left and right context vectors for each phrase candidate.
Xin Liu - San Jose CA, US Stewart Yang - Sunnyvale CA, US
Assignee:
Google Inc. - Mountain View CA
International Classification:
G06F 17/28
US Classification:
704 4, 704 7, 704 8, 704 9, 704 10
Abstract:
Methods, systems and apparatus, including computer program products, for identifying properties of an electronic document. In one aspect, a sequence of bytes representing text in a document is received. A plurality of byte-n-grams are identified from the bytes. For multiple encodings, a respective likelihood of each byte-n-gram occurring in each of the respective multiple encodings is identified. A respective encoding score for each of the multiple encodings is determined. A most likely encoding of the document is identified based on a highest encoding score among the encoding scores. In another aspect, a sequence of characters, having an encoding, are identified in a document. The sequence is segmented into features, each corresponding to two or more characters. A respective score for each of multiple languages is determined based on the features and a respective language model. A language of the document is identified based on the scores.
Generating Phrase Candidates From Text String Entries
Stewart Yang - Sunnyvale CA, US Fang Liu - China, CN Dekang Lin - Cupertino CA, US Hongjun Zhu - Los Altos CA, US
Assignee:
Google Inc. - Mountain VIew CA
International Classification:
G06F 17/30
US Classification:
707767, 707736, 704 10
Abstract:
Aspects directed to phrase generation are provided. A method is provided that includes identifying a plurality of phrase candidates from a plurality of text string entries in a corpus. For each phrase candidate: identifying a plurality of left contexts and a plurality of right contexts for the phrase candidate, each left context of the plurality of left contexts being a nearest unique feature to the right of the phrase candidate in a text string entry and each right context of the plurality of right contexts being the nearest unique feature to the right of the phrase candidate, and calculating a left context vector including a score for each left context feature and a right context vector including a score for each right context feature of the phrase candidate. A similarity is determined between pairs of phrase candidates using the respective left and right context vectors for each phrase candidate.
Stewart Yang - San Jose CA, US Fang Liu - Beijing, CN Pei Cao - Palo Alto CA, US
Assignee:
Google Inc. - Mountain View CA
International Classification:
G06F 17/30
US Classification:
707736, 707749
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying non-compositional compounds. In one aspect, a method includes the actions of receiving a collection of phrases, each phrase including two or more words; for each phrase, determining if the phrase is a non-compositional compound, a non-compositional compound being a phrase of two or more words where the words composing the phrase have different meanings in a compound than their conventional meanings individual, the determining including: identifying a similar term for a term of the phrase, substituting the similar term for the term of the phrase to generate a substitute phrase, calculating a similarity between the phrase and the substitute phrase, and identifying the phrase as a non-compositional compound when the calculated similarity is less than a specified threshold value.
Googleplus
Stewart Yang
Education:
University of Warwick
Youtube
18. Ying Yang - Jigglin - Welcome to the Rile...
I dont hold any rights to music or video material. All rights belong t...
Category:
People & Blogs
Uploaded:
23 Sep, 2010
Duration:
4m 26s
Round 1 Highlights: Accenture Match Play Cham...
Mike Weir, Rory McIlroy, YE Yang, Stewart Cink and Ryo Ishikawa are ju...
Category:
Sports
Uploaded:
18 Feb, 2010
Duration:
3m 18s
TWILIGHT BREAKING DAWN- BooBoo Stewart Is Sin...
We got an exclusive interview with BooBoo Stewart and Fivel Stewart! B...
Category:
Entertainment
Uploaded:
24 Nov, 2010
Duration:
1m 36s
Rod Stewart-Young turks
Can't get enough from Rod Stewart? Rod stewart - The killing of Georgi...
Category:
Music
Uploaded:
19 Mar, 2008
Duration:
5m 5s
Ying Yang Twins - Salt Shaker
Ying Yang Twins - Salt Shaker Me & My Brother ColliPark Music / TVT Re...
Category:
Music
Uploaded:
18 Jan, 2009
Duration:
3m 59s
3771 Stewart Avenue, Mar Vista | 3771StewartA...
3771stewartavenu... Charming, beautifully restored in 2011 California...