Abstract:
Disclosed are system and method embodiments for determining the root-causes of a performance objective violation, such as an end-to-end service level objection (SLO) violation, in a large-scale system with multi-tiered applications. This determination is made using a hybrid of component-level snapshots of the state of the system during a period in which an abnormal event occurred (i.e., black box mapping) and of known events and their causes (i.e., white-box mapping). Specifically, in response to a query about a violation (e.g., why did the response time for application a increase from r to r), a processor will access and correlate the black-box and white-box mappings to determine a short-list of probable causes for the violation.