Steven C. Almo - Pelham NY, US Andras Fiser - Astoria NY, US Rotem Rubinstein - Jackson Heights NY, US
International Classification:
G06F 19/20 G06F 19/22
US Classification:
702 20
Abstract:
The present invention provides a method of determining related proteins, the method comprising obtaining sequences of interest, wherein the sequences are amino acid sequences for proteins or nucleotide sequences encoding proteins; comparing segments of each sequence of interest with a database of amino acid or nucleotide sequences; generating a profile for each sequence of interest comprising a list of all sequences from the database of sequences that have segments corresponding to the segments of each sequence of interest; and comparing the database sequences appearing in the profile of each sequence of interest to the database sequences appearing in the profile of every other sequence of interest, wherein similar profiles indicate that the sequences of interest correspond to related proteins while dissimilar profiles indicate that the sequences of interest do not correspond to related proteins, wherein profiles are similar if there is at least a 30% overlap between the database sequences appearing in the profiles of the sequences of interest.