Skip to main content

An Embedded Energy Monitoring Circuit for a 128kbit SRAM with Body-biased Sense-Amplifiers

Image of chip design
Embedded energy monitoring of critical system components can be used to enable better power management by capturing run time system conditions such as temperature and application load. In this work, an energy sensing circuit that provides digitally represented absolute energy per operation of a 128kbit SRAM is presented. Designed in a 65nm low-power CMOS process, SRAMs can operate down to 370 mV. Energy sensing circuit consumes 16.7µW during sensing at 1.2V (only 0.28% of SRAM active power at the same voltage). For improved performance, SRAMs utilize body-biased PMOS input strong-arm type sense amplifiers that can achieve 45% tighter input offset distribution for only ~3.5% of total SRAM area overhead.

An Embedded Energy Monitoring Circuit for a 128kbit SRAM with Body-biased Sense-Amplifiers

Image of chip design
Embedded energy monitoring of critical system components can be used to enable better power management by capturing run time system conditions such as temperature and application load. In this work, an energy sensing circuit that provides digitally represented absolute energy per operation of a 128kbit SRAM is presented. Designed in a 65nm low-power CMOS process, SRAMs can operate down to 370 mV. Energy sensing circuit consumes 16.7µW during sensing at 1.2V (only 0.28% of SRAM active power at the same voltage). For improved performance, SRAMs utilize body-biased PMOS input strong-arm type sense amplifiers that can achieve 45% tighter input offset distribution for only ~3.5% of total SRAM area overhead.

Reconfigurable Switched Capacitor DC-DC Converter using On-Chip Ferroelectric Capacitors

Image of chip design
A reconfigurable switched capacitor DC-DC converter featuring high density ferroelectric capacitors (Fe-Caps) for charge transfer is designed in this work. The converter supports four gain settings (1-2/3-1/2-1/3) to supply wide output voltage range and is split in four modules for output voltage ripple reduction. The control circuit exploits dynamic gain selection and Pulse Frequency Modulation (PFM) for efficient output voltage regulation of the multi-phased converter. The chip is fabricated in 130 nm CMOS process and the system occupies an area of 0.366 sq.mm. It supports output voltage of 0.4V to 1.1V from 1.5V input while delivering load current of 20µA to 1mA and achieves a peak efficiency of 93% including the control circuit overhead.

Reconfigurable Switched Capacitor DC-DC Converter using On-Chip Ferroelectric Capacitors

Image of chip design
A reconfigurable switched capacitor DC-DC converter featuring high density ferroelectric capacitors (Fe-Caps) for charge transfer is designed in this work. The converter supports four gain settings (1-2/3-1/2-1/3) to supply wide output voltage range and is split in four modules for output voltage ripple reduction. The control circuit exploits dynamic gain selection and Pulse Frequency Modulation (PFM) for efficient output voltage regulation of the multi-phased converter. The chip is fabricated in 130 nm CMOS process and the system occupies an area of 0.366 sq.mm. It supports output voltage of 0.4V to 1.1V from 1.5V input while delivering load current of 20µA to 1mA and achieves a peak efficiency of 93% including the control circuit overhead.

Reconfigurable Processor for Energy-Scalable Computational Photography

Image of chip design
Computational photography applications significantly extend and enhance the capabilites of existing cameras. The high computational complexity of such multimedia processing applications necessitates fast hardware implementations to allow real-time processing. This work implements a reconfigurable multi-application processor to enable energy-efficient real-time computational photography on portable multimedia devices. The reconfigurable hardware implements Bilateral filtering - a non-linear filtering technique with wide range of computational photography applications, and implements it using a Bilateral Grid structure, which represents an image using a 3D data structure and filters it using a 3D Gaussian kernel. The processor implements High Dynamic Range (HDR) imaging, Low-Light Enhancement, by merging flash and non-flash images such that the natural scene ambience is preserved while achieving high details and low noise, and Glare Reduction. The filtering engine can also be accessed from off-chip and used with other applications. The implementation significantly accelerates bilateral filtering and enables various edge-aware image processing applications in real-time on HD images. The processor, implemented using 40 nm CMOS technology, is operational from 25 MHz at 0.5 V to 98 MHz at 0.9 V. The testchip achieves 13 megapixel/s throughput while consuming 1.4 mJ/megapixel energy at 0.9 V - a significant energy reduction compared to CPU/GPU implementations.

Reconfigurable Processor for Energy-Scalable Computational Photography

Image of chip design
Computational photography applications significantly extend and enhance the capabilites of existing cameras. The high computational complexity of such multimedia processing applications necessitates fast hardware implementations to allow real-time processing. This work implements a reconfigurable multi-application processor to enable energy-efficient real-time computational photography on portable multimedia devices. The reconfigurable hardware implements Bilateral filtering - a non-linear filtering technique with wide range of computational photography applications, and implements it using a Bilateral Grid structure, which represents an image using a 3D data structure and filters it using a 3D Gaussian kernel. The processor implements High Dynamic Range (HDR) imaging, Low-Light Enhancement, by merging flash and non-flash images such that the natural scene ambience is preserved while achieving high details and low noise, and Glare Reduction. The filtering engine can also be accessed from off-chip and used with other applications. The implementation significantly accelerates bilateral filtering and enables various edge-aware image processing applications in real-time on HD images. The processor, implemented using 40 nm CMOS technology, is operational from 25 MHz at 0.5 V to 98 MHz at 0.9 V. The testchip achieves 13 megapixel/s throughput while consuming 1.4 mJ/megapixel energy at 0.9 V - a significant energy reduction compared to CPU/GPU implementations.

HEVC Video Decoder for 4K Ultra HD Applications

A video decoder chip supporting the High Efficiency Video Coding (HEVC) standard is designed in 40nm CMOS. The chip runs at 200 MHz at 0.9V with a throughput of 249 Mpixel/s to meet the requirements of 4K Ultra HD applications. Various architectural innovations are implemented in the chip to address large and variable pixel block sizes in HEVC and longer interpolation filters compared to the previous H.264/AVC standard. A motion compensation cache is designed to reduce the average bandwidth required from external memory by 67%. The chip consumes 78mW when decoding video at a resolution of 3840x2160 at 30 frames/s. The total system efficiency including simulated DRAM power is 1.19 nJ/pixel.

HEVC Video Decoder for 4K Ultra HD Applications

Image of chip design
A video decoder chip supporting the High Efficiency Video Coding (HEVC) standard is designed in 40nm CMOS. The chip runs at 200 MHz at 0.9V with a throughput of 249 Mpixel/s to meet the requirements of 4K Ultra HD applications. Various architectural innovations are implemented in the chip to address large and variable pixel block sizes in HEVC and longer interpolation filters compared to the previous H.264/AVC standard. A motion compensation cache is designed to reduce the average bandwidth required from external memory by 67%. The chip consumes 78mW when decoding video at a resolution of 3840x2160 at 30 frames/s. The total system efficiency including simulated DRAM power is 1.19 nJ/pixel.