The technology disclosed provides a novel and innovative technique for compact deployment of application code to stream processing systems. In particular, the technology disclosed relates to obviating the need of accompanying application code with its dependencies during deployment (i.e., creating fat jars) by operating a stream processing system within a container defined over worker nodes of whole machines and initializing the worker nodes with precompiled dependency libraries having precompiled classes. Accordingly, the application code is deployed to the container without its dependencies, and, once deployed, the application code is linked with the locally stored precompiled dependencies at runtime. In implementations, the application code is deployed to the container running the stream processing system between 300 milliseconds and 6 seconds. This is drastically faster than existing deployment techniques that take anywhere between 5 to 15 minutes for deployment.
Managing Resource Allocation In A Stream Processing Framework
- San Francisco CA, US Jeffrey CHAO - San Francisco CA, US
International Classification:
G06F 9/50
Abstract:
The technology disclosed herein relates to method, system, and computer program product (computer-readable storage device) embodiments for managing resource allocation in a stream processing framework. An embodiment operates by configuring an allocation of a task sequence and machine resources to a container, partitioning a data stream into a plurality of batches arranged for parallel processing by the container via the machine resources allocated to the container, and running the task sequence, running at least one batch of the plurality of batches. Some embodiments may also include changing the allocation responsive to a determination of an increase in data volume, and may further include changing the allocation to a previous state of the allocation, responsive to a determination of a decrease in data volume. Additionally, time-based throughput of the data stream may be monitored for a given worker node configured to run a batch of the plurality of batches.
- San Francisco CA, US Jeffrey Chao - San Francisco CA, US
Assignee:
salesforce.com, inc. - San Francisco CA
International Classification:
G06F 11/14 G06F 11/20
Abstract:
The technology disclosed relates to discovering multiple previously unknown and undetected technical problems in fault tolerance and data recovery mechanisms of modern stream processing systems. In addition, it relates to providing technical solutions to these previously unknown and undetected problems. In particular, the technology disclosed relates to discovering the problem of modification of batch size of a given batch during its replay after a processing failure. This problem results in over-count when the input during replay is not a superset of the input fed at the original play. Further, the technology disclosed discovers the problem of inaccurate counter updates in replay schemes of modern stream processing systems when one or more keys disappear between a batch's first play and its replay. This problem is exacerbated when data in batches is merged or mapped with data from an external data store.
Maintaining Throughput Of A Stream Processing Framework While Increasing Processing Load
- San Francisco CA, US Jeffrey Chao - San Francisco CA, US
International Classification:
G06F 9/50 G06F 3/06
Abstract:
The technology disclosed relates to maintaining throughput of a stream processing framework while increasing processing load. In particular, it relates to defining a container over at least one worker node that has a plurality workers, with one worker utilizing a whole core within a worker node, and queuing data from one or more incoming near real-time (NRT) data streams in multiple pipelines that run in the container and have connections to at least one common resource external to the container. It further relates to concurrently executing the pipelines at a number of workers as batches, and limiting simultaneous connections to the common resource to the number of workers by providing a shared connection to a set of batches running on a same worker regardless of the pipelines to which the batches in the set belong.
Compact Task Deployment For Stream Processing Systems
- San Francisco CA, US Jeffrey CHAO - San Francisco CA, US
Assignee:
salesforce.com, inc. - San Francisco CA
International Classification:
G06F 9/48 G06F 9/50 G06F 9/455 G06F 9/44
Abstract:
The technology disclosed provides a novel and innovative technique for compact deployment of application code to stream processing systems. In particular, the technology disclosed relates to obviating the need of accompanying application code with its dependencies during deployment (i.e., creating fat jars) by operating a stream processing system within a container defined over worker nodes of whole machines and initializing the worker nodes with precompiled dependency libraries having precompiled classes. Accordingly, the application code is deployed to the container without its dependencies, and, once deployed, the application code is linked with the locally stored precompiled dependencies at runtime. In implementations, the application code is deployed to the container running the stream processing system between 300 milliseconds and 6 seconds. This is drastically faster than existing deployment techniques that take anywhere between 5 to 15 minutes for deployment.
Maintaining Throughput Of A Stream Processing Framework While Increasing Processing Load
- San Francisco CA, US Jeffrey Chao - San Francisco CA, US
Assignee:
salesforce.com, inc. - San Francisco CA
International Classification:
G06F 9/50 G06F 3/06
Abstract:
The technology disclosed relates to maintaining throughput of a stream processing framework while increasing processing load. In particular, it relates to defining a container over at least one worker node that has a plurality workers, with one worker utilizing a whole core within a worker node, and queuing data from one or more incoming near real-time (NRT) data streams in multiple pipelines that run in the container and have connections to at least one common resource external to the container. It further relates to concurrently executing the pipelines at a number of workers as batches, and limiting simultaneous connections to the common resource to the number of workers by providing a shared connection to a set of batches running on a same worker regardless of the pipelines to which the batches in the set belong.
Managing Processing Of Long Tail Task Sequences In A Stream Processing Framework
- SAN FRANCISCO CA, US Jeffrey Chao - San Francisco CA, US
Assignee:
salesforce.com, inc. - SAN FRANCISCO CA
International Classification:
G06F 9/50 G06F 17/30
Abstract:
The technology disclosed relates to managing processing of long tail task sequences in a stream processing framework. In particular, it relates to operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences, and queuing data from the NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads. The method also includes assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid-scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines.
Managing Resource Allocation In A Stream Processing Framework
- SAN FRANCISCO CA, US Jeffrey Chao - San Francisco CA, US
Assignee:
salesforce.com, inc. - SAN FRANCISCO CA
International Classification:
G06F 9/50 G06F 9/52
Abstract:
The technology disclosed relates to managing resource allocation to task sequences in a stream processing framework. In particular, it relates to operating a computing grid that includes machine resources, with heterogeneous containers defined over whole machines and some containers including multiple machines. It also includes initially allocating multiple machines to a first container, initially allocating first set of stateful task sequences to the first container, running the first set of stateful task sequences as multiplexed units of work under control of a container-scheduler, where each unit of work for a first task sequence runs to completion on first machine resources in the first container, unless it overruns a time-out, before a next unit of work for a second task sequence runs multiplexed on the first machine resources. It further includes automatically modifying a number of machine resources and/or a number assigned task sequences to a container.
Apple Aug 2015 - Jan 2018
Global Supply Manager, Procurement
Apple Aug 2015 - Jan 2018
Producer, Events
Apple Jun 2014 - May 2015
Mba Summer Intern, Worldwide Operations Procurement
Phillips 66 Aug 2011 - Jun 2013
Senior Financial Analyst, Commercial Trading
Kpmg Uk Aug 2007 - Aug 2011
Senior Associate, Performance and Technology Advisory
Education:
The University of Chicago Booth School of Business 2013 - 2015
Master of Business Administration, Masters, Economics, Management, Statistics, Entrepreneurship
The University of Texas at Austin 2003 - 2007
Bachelors, Bachelor of Business Administration, Corporate Finance
Skills:
Business Process Improvement Financial Analysis Financial Modeling Financial Reporting Managerial Finance Financial Accounting Management Consulting Oil/Gas Finance Transformation Project Management Strategy Financial Management Sourcing and Procurement Spend Analysis Cost Accounting Commercial Finance Natural Gas Liquids Oil&Gas Midstream Digital Media Licensing Sports Business
Sk Hynix Memory Solutions Inc.
Staff Engineer
Supermicro Aug 2010 - Aug 2018
Storage Validation Engineer
Education:
Santa Clara University 2004 - 2009
Bachelors, Bachelor of Science, Electrical Engineering
New York University 2000 - 2002
Master of Science, Masters, Computer Science
National Tsing Hua University 1995 - 1999
Bachelors, Bachelor of Science, Computer Science
Skills:
Debugging Computer Architecture Hardware Linux Integration Troubleshooting Red Hat Linux Testing Computer Hardware Management Storage
Starr Indemnity Insurance Company - San Francisco since Apr 2012
Underwriting Technician
Chartis Insurance - San Francisco Bay Area Jul 2010 - Apr 2012
Underwriting Technician
Chartis - San Francisco Bay Area Sep 2009 - Jul 2010
Service Specialist
AIG - San Francisco Bay Area Dec 2007 - Sep 2009
Administrative Assistant
American International Group Feb 2007 - Dec 2007
File Administrator
Education:
University of California, Riverside 2002 - 2006
Bachelors of Science, Business Administration
West Valley College
Skills:
Risk Management Insurance Underwriting Commercial Insurance Powerpoint Microsoft Excel Workers Compensation Customer Service Brokers
Chevron - Richmond, CA since Oct 2012
Design Engineer
The Berkeley Group Jan 2011 - May 2012
Project Leader
The Berkeley Group Jan 2011 - May 2011
Consultant
Chang Chun Petrochemical Group May 2009 - Aug 2009
Design Engineering Intern
Education:
University of California, Berkeley 2008 - 2012
B.S., Mechanical Engineering
Peking University 2010 - 2010
Oxford Academy 2002 - 2008
Skills:
Matlab Solidworks Consulting Nonprofits Labview Services Fundraising San Mechanical Engineering Non Profits
Nissan Motor Corporation Feb 2007 - Aug 2007
District Parts and Service Specialist
Nissan Motor Corporation May 2004 - Mar 2007
Sales and Marketing Rotation Program
At&T May 2004 - Mar 2007
Business Development Manager
Education:
University of California, Irvine - the Paul Merage School of Business 2007 - 2009
Master of Business Administration, Masters, Marketing, Management
Wu (Vienna University of Economics and Business) 2008 - 2008
Uc San Diego 1996 - 2000
Bachelors, Bachelor of Arts, Psychology
Uc Irvine
Master of Business Administration, Masters
University of California
Skills:
Cross Functional Team Leadership Project Management Leadership Management Sales Telecommunications Mobile Devices Marketing Strategy Negotiation Management Consulting E Commerce Brand Management
Abdominal Hernia Breast Disorders Cholelethiasis or Cholecystitis Malignant Neoplasm of Female Breast Inguinal Hernia
Languages:
English Spanish
Description:
Dr. Chao graduated from the University of Louisville School of Medicine in 2001. He works in Riverside, CA and specializes in General Surgery. Dr. Chao is affiliated with Kaiser Permanente Riverside Medical Center.