A prototype of a pre-standard algorithm developed for HEVC called Massively Parallel CABAC that addresses a key bottleneck in the video decoder is implemented in a 65-nm CMOS process. The scalable testchip achieves a throughput of 24.11bins/cycle, which enables it to decode the max H.264/AVC bitrate (300Mb/s) with a 18MHz clock at 0.7V, consuming 12.3pJ/bin. At 1.0V, it decodes a peak of 3026Mbins/s for a bit-rate of 2.3Gb/s, which is enough for over seven 300Mb/s sequences or a 4kx2k resolution video at 186 fps. Joint algorithm and architecture optimizations are used to reduce critical path delay and memory requirements with little or no cost in coding efficiency.