Douglas Ronald Burdick - San Jose CA, US Amol Ghoting - Yorktown Heights NY, US Rajasekar Krishnamurthy - San Jose CA, US Edwin Peter Dawson Pednault - Yorktown Heights NY, US Berthold Reinwald - San Jose CA, US Vikas Sindhwani - Yorktown Heights NY, US Shirish Tatikonda - San Jose CA, US Yuanyuan Tian - San Jose CA, US Shivakumar Vaithyanathan - San Jose CA, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 15/18
US Classification:
706 12
Abstract:
Systems and methods for processing Machine Learning (ML) algorithms in a MapReduce environment are described. In one embodiment of a method, the method includes receiving a ML algorithm to be executed in the MapReduce environment. The method further includes parsing the ML algorithm into a plurality of statement blocks in a sequence, wherein each statement block comprises a plurality of basic operations (hops). The method also includes automatically determining an execution plan for each statement block, wherein at least one of the execution plans comprises one or more low-level operations (lops). The method further includes implementing the execution plans in the sequence of the plurality of the statement blocks.
- Armonk NY, US MUSTAFA CANIM - San Jose CA, US Douglas Ronald Burdick - San Jose CA, US
International Classification:
G06F 16/35 G06F 40/30 G06F 40/131
Abstract:
Organizing and/or aligning fragments of text that are included in a set of physical and/or digital documents so that the arrangement of the text fragments is in a readily understandable and meaningful format for a given reader. This organization and/or alignment uses a relation model of the various text fragments to correlate a meaning between and amongst the various text fragments to ultimately determine the final alignment and/or arrangement of those text fragments.
Vision-Based Cell Structure Recognition Using Hierarchical Neural Networks
- Armonk NY, US Douglas R. Burdick - San Jose CA, US Xinyi Zheng - Ann Arbor MI, US
International Classification:
G06K 9/00 G06T 7/10 G06N 3/04 G06K 9/68 G06K 9/66
Abstract:
Methods, systems, and computer program products for vision-based cell structure recognition using hierarchical neural networks and cell boundaries to structure clustering are provided herein. A computer-implemented method includes detecting a style of the given table using at least one style classification model; selecting, based at least in part on the detected style, a cell detection model appropriate for the detected style; detecting cells within the given table using the selected cell detection model; and outputting, to at least one user, information pertaining to the detected cells comprising image coordinates of one or more bounding boxes associated with the detected cells.
Automated Non-Native Table Representation Annotation For Machine-Learning Models
One embodiment provides a method, including: receiving two documents, one of the two documents having at least one table that includes the same information as a corresponding table in the other of the two documents, wherein (i) one of the two documents comprises the at least one table in an unstructured table representation and (ii) the other of the two documents comprises the at least one table in a structured table representation; identifying text elements within the at least one table in the unstructured table representation; matching the identified text elements with table elements within the at least one table in the structured table representation; and annotating the at least one table in the structured table representation based upon the matches between the table elements and text elements.
- Armonk NY, US Douglas R. Burdick - San Jose CA, US Hima P. Karanam - Hyderabad, IN Rajasekar Krishnamurthy - Campbell CA, US Lucian Popa - San Jose CA, US Shivakumar Vaithyanathan - San Jose CA, US
International Classification:
G06F 17/30
Abstract:
Embodiments relate to entity resolution. One aspect includes creating a deterministic model by defining an entity to be resolved, selecting two datasets for comparison, defining matching predicates for attributes of the datasets to select a set of candidate matches, and defining a precedence rule for the candidate matches to select a subset of the candidate matches. An aspect further includes running the deterministic model on the two datasets. Running the deterministic model includes applying the matching predicates and the precedence rule to data in the datasets that correspond to the attributes. An aspect also includes applying a cardinality rule to results of the running, and outputting the matching candidates for which the cardinality rule is satisfied.
Hybrid Parallelization Strategies For Machine Learning Programs On Top Of Mapreduce
- Armonk NY, US Douglas Burdick - San Jose CA, US Berthold Reinwald - San Jose CA, US Prithviraj Sen - San Jose CA, US Shirish Tatikonda - San jose CA, US Yuanyuan Tian - San Jose CA, US Shivakumar Vaithyanathan - San Jose CA, US
International Classification:
G06F 9/45
Abstract:
Hybrid parallelization strategies for machine learning programs on top of MapReduce are provided. In one embodiment, a method of and computer program product for parallel execution of machine learning programs are provided. Program code is received. The program code contains at least one parallel for statement having a plurality of iterations. A parallel execution plan is determined for the program code. According to the parallel execution plan, the plurality of iterations is partitioned into a plurality of tasks. Each task comprises at least one iteration. The iterations of each task are independent.