International Business Machines Corporation - Armonk NY
International Classification:
G06F 1700
US Classification:
706 48, 341 55
Abstract:
Given an input sequence of data, a motif is a repeating pattern. The data could be a sequence of characters or sets of characters or even real values. In the first two cases, the number of motifs could potentially be exponential in the size of the input sequence and in the third case there could be uncountably infinite number of motifs. By suitably defining the notion of maximality and redundancy for any sequence with n characters, there exists only a linear (or no more than 3n) number of special motifs and every other motif can be generated from these irredundant motifs.
System And Method For Encoding And Detecting Extensible Patterns
International Business Machines Corporation - Armonk NY
International Classification:
G06F 17/30 G06F 7/00
US Classification:
707 6, 707100
Abstract:
Given an input sequence of data, a rigid pattern is a repeating sequence, possibly interspersed with don't-care characters. The data can be a sequence of characters or sets of characters or even real values. In practice, the patterns or motifs of interest are the ones that also allow a variable number of gaps (or don't-care characters): these are patterns with spacers termed extensible patterns. In a bioinformatics context, similar patterns have also been called flexible patterns or motifs. A system according to the invention discovers all the maximal extensible motifs in the input. The flexibility is succinctly defined by a single integer parameter D≧1 which is interpreted as the allowable space to be between 1 and D characters between two successive solid characters in a reported motif.
Method And Structure For Lossy Compression Of Continuous Data With Extensible Motifs
Laxmi Priya Parida - Mohegan Lake NY, US Alberto Apostolico - West Lafayette IN, US Matteo Comin - Venice, IT
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
H03M 7/30
US Classification:
341 87, 707 6
Abstract:
A method (and structure) of data processing in which data is represented in a lossy data format as a plurality of extensible motifs. Each extensible motif has at least one don't-care character enclosed by at least one non-don't-care character on a left side and at least one non-don't-care character on a right side.
Object Classification Using An Optimized Boolean Expression
Laxmi Priya Parida - Mohegan Lake NY, US Alberto Apostolico - West Lafayette IN, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
H03M 7/00
US Classification:
341 60, 341 51, 341 67
Abstract:
An apparatus for data compression includes an identifier which identifies a plurality of irredundant patterns in a data set, and an extractor which extracts at least a portion of the plurality of irredundant patterns from the data set to generate a compressed data set.
Topological Motifs Discovery Using A Compact Notation
International Business Machines Corporation - Armonk NY
International Classification:
G06N 5/00
US Classification:
706 45, 706 12, 706 48
Abstract:
Discloses are a method of and a system for identifying a motif in a graph. The graph has multiple vertices, and the vertices have one or more attributes. The method comprises the steps of, for each of the vertices that have at least a defined one attribute, identifying a set of vertices, if any, adjacent to said each vertex and having at least one specified attribute; and forming a first list comprised of said identified sets. The method comprises the further steps of determining the unique intersections of the sets of said first list; computing compact forms of the sets on said first list; and identifying a motif of the graph from said unique intersections.
Pattern Discovery Techniques For Determining Maximal Irredundant And Redundant Motifs
International Business Machines Corporation - Armonk NY
International Classification:
G01N 33/48 G06G 7/48 C12Q 1/68
US Classification:
702 19, 702 20, 703 11, 435 6
Abstract:
Basis motifs are determined from an input sequence though an iterative technique that begins by creating small solid motifs and continues to create larger motifs that include “don't care” characters and that can include flexible portions. The small solid motifs, including don't care characters and flexible portions, are concatenated to create larger motifs. During each iteration, motifs are trimmed to remove redundant motifs and other motifs that do not meet certain criteria. The process is continued until no new motifs are determined. At this point, the basis set of motifs has been determined. The basis motifs are used to construct redundant motifs. The redundant motifs are formed by determining a number of sets for selected basis motifs. From these sets, unique intersection sets are determined. The redundant motifs are determined from the unique intersection sets and the basis motifs.
Methods And Systems For Conservative Extraction Of Over-Represented Extensible Motifs
Alberto Apostolico - Atlanta GA, US Matteo Comin - Venice, IT Laxmi Priya Parida - Mohegan Lake NY, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 19/00 G06F 15/00 G11C 17/00
US Classification:
702 20, 700 1, 305 94
Abstract:
Methods and systems of extracting extensible motifs from a sequence include assigning a significance to extensible motifs within the sequence based upon a syntactic and statistical analysis, and identifying extensible motifs having a significance that exceeds a predetermined threshold.