A high-frequency and low-power L1 cache and associated access technique. The method may include inspecting a virtual address of an L1 data cache load instruction, and indexing into a row and a column of a way predictor table using metadata and a virtual address associated with the load instruction. The method may include matching information stored at the row and the column of the way predictor table to a location of a cache line. The method may include predicting the location of the cache line within the L1 data cache based on the information match. A hierarchy of way predictor tables may be used, with higher level way predictor tables refreshing smaller lower level way predictor tables. The way predictor tables may be trained to make better predictions over time. Only selected circuit macros need to be enabled based on the predictions, thereby saving power.
- Suwon-si, KR Rama S. GOPAL - Austin TX, US Karthik SUNDARAM - Austin TX, US
International Classification:
G06F 9/30 G06F 9/38 G06F 9/35 G06F 12/0875
Abstract:
A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the first data read from the cache in response to the first memory load instruction, and, in parallel, a memory-load address-forwarded result is selected based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second of the two immediately consecutive memory load instructions. Second data stored in the cache is read by the LSU in response to the second memory load instruction based on the selected memory-load address-forwarded result.
System And Method Of Reducing Computer Processor Power Consumption Using Micro-Btb Verified Edge Feature
According to one general aspect, an apparatus may include a front end logic section comprising a main-branch target buffer (BTB). The apparatus may also include a micro-BTB separate from the main BTB, and configured to produce prediction information associated with a branching instruction and mark prediction information as verified when one or more conditions are satisfied. Wherein the front end logic section is configured to be, at least partially, powered down when the data stored by the micro-BTB that results in the prediction information is marked as previously verified.
- Suwon-si, KR Paul E. KITCHIN - Austin TX, US Karthik SUNDARAM - Austin TX, US
International Classification:
G06F 7/485
Abstract:
According to one general aspect, a load unit may include a load circuit configured to load at least one piece of data from a memory. The load unit may include an alignment circuit configured to align the data to generate an aligned data. The load unit may also include a mathematical operation execution circuit configured to generate a resultant of a predetermined mathematical operation with the at least one piece of data as an operand. Wherein the load unit is configured to, if an active instruction is associated with the predetermined mathematical operation, bypass the alignment circuit and input the piece of data directly to the mathematical operation execution circuit.
- Suwon-si, KR Rama S. GOPAL - Austin TX, US Karthik SUNDARAM - Austin TX, US
International Classification:
G06F 9/30 G06F 9/35 G06F 12/0875
Abstract:
A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the first data read from the cache in response to the first memory load instruction, and, in parallel, a memory-load address-forwarded result is selected based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second of the two immediately consecutive memory load instructions. Second data stored in the cache is read by the LSU in response to the second memory load instruction based on the selected memory-load address-forwarded result.
Adaptive Mechanism To Tune The Degree Of Pre-Fetches Streams
Arun RADHAKRISHNAN - Austin TX, US Karthik SUNDARAM - Austin TX, US
International Classification:
G06F 12/08
Abstract:
According to one general aspect, a method may include monitoring a plurality of pre-fetch cache requests associated with a data stream. The method may also include evaluating an accuracy of the pre-fetch cache requests. The method may further include, based at least in part upon the accuracy of the pre-fetch cache requests, adjusting a maximum amount of data that is allowably pre-fetched in excess of a data stream's current actual demand for data.
- Suwon-si, KR Kevin LEPAK - Austin TX, US Rama GOPAL - Austin TX, US Murali CHINNAKONDA - Austin TX, US Karthik SUNDARAM - Austin TX, US Brian GRAYSON - Austin TX, US
International Classification:
G06F 12/08
Abstract:
According to one general aspect, an apparatus may include a cache pre-fetcher, and a pre-fetch scheduler. The cache pre-fetcher may be configured to predict, based at least in part upon a virtual address, data to be retrieved from a memory system. The pre-fetch scheduler may be configured to convert the virtual address of the data to a physical address of the data, and request the data from one of a plurality of levels of the memory system. The memory system may include a plurality of levels, each level of the memory system configured to store data.
- Suwon-si, KR Karthik SUNDARAM - Austin TX, US Brian GRAYSON - Austin TX, US
International Classification:
G06F 12/08
Abstract:
According to one general aspect, a method may include receiving, by a pre-fetch unit, a demand to access data stored at a memory address. The method may include determining if a first portion of the memory address matches a prior defined region of memory. The method may further include determining if a second portion of the memory address matches a previously detected pre-fetched address portion. The method may also include, if the first portion of the memory address matches the prior defined region of memory, and the second portion of the memory address matches the previously detected pre-fetched address portion, confirming that a pre-fetch pattern is associated with the memory address.
Samsung Austin R&D Center since Aug 2010
Staff Design Engineer
Advanced Micro Devices Sep 2003 - Aug 2010
Senior Design Engineer
ARM Jan 2003 - Jun 2003
Design Intern
Education:
The University of Texas at Austin 2001 - 2003
MS, Electrical and Computer Engineering
Indian Institute of Technology, Madras 1997 - 2001
BTech, Electrical Engineering