Abstract:
In one embodiment, access one or more pairs of search query and clicked Uniform Resource Locator (URL). For each of the pairs of search query and clicked URL, segment the search query into one or more query segments and the clicked URL into one or more URL segments; construct one or more query-URL n-grams, each of which comprises a query part comprising at least one of the query segments and a URL part comprising at least one of the URL segments; and calculate one or more association scores, each of which for one of the query-URL n-grams and represents a similarity between the query part and the URL part of the query-URL n-gram and is based on a first frequency of the query part and the URL part, a second frequency of the query part, and a third frequency of the URL part.