New Compiler Framework Aligns Homomorphic Encryption with TPU Architecture
Global: New Compiler Framework Aligns Homomorphic Encryption with TPU Architecture
A team of researchers announced a compiler framework called CROSS that enables homomorphic encryption (HE) workloads to run efficiently on Google’s Tensor Processing Units (TPU v6e). The work, posted on arXiv in January 2025, aims to narrow the energy‑efficiency gap between general‑purpose GPUs and specialized HE ASICs by leveraging the high‑throughput, low‑precision matrix engine of TPUs.
Background and Motivation
Homomorphic encryption offers strong data privacy for cloud computing but typically incurs prohibitive computational costs. While GPUs have been used to accelerate HE, they still fall short of the energy efficiency achieved by dedicated ASICs, which are expensive to develop and deploy.
Challenges with Existing HE Implementations on TPUs
Current state‑of‑the‑art HE algorithms are optimized for GPUs and rely on 32‑bit integer arithmetic and fine‑grained data permutations. These characteristics clash with the TPU architecture, which features a coarse‑grained memory subsystem and a matrix multiplication unit (MXU) designed for 8‑bit operations. As a result, porting GPU‑optimized HE libraries to TPUs leads to under‑utilization of the MXU and significant performance degradation.
CROSS Framework Overview
The CROSS framework addresses these mismatches through systematic code transformations that align HE computations with TPU hardware. By converting high‑precision modular arithmetic into dense low‑precision matrix multiplications, CROSS unlocks the MXU’s capabilities for HE workloads.
Key Techniques: Basis‑Aligned and Memory‑Aligned Transformations
Basis‑Aligned Transformation (BAT) replaces 32‑bit modular operations with INT8 matrix multiplications, enabling the MXU to process HE primitives efficiently. Memory‑Aligned Transformation (MAT) embeds data reordering directly into compute kernels, eliminating costly runtime permutations and improving memory access patterns.
Performance Evaluation
Benchmarking on TPU v6e shows that CROSS delivers higher throughput per watt for Number‑Theoretic Transform (NTT) and other HE operators compared with existing GPU‑focused libraries such as WarpDrive, FIDESlib, FAB, HEAP, and Cheddar. The results suggest that AI ASICs can become the state‑of‑the‑art platform for energy‑efficient HE processing.
Implications and Future Work
The study demonstrates that compiler‑level adaptations can repurpose AI accelerators for privacy‑preserving computation, potentially reducing reliance on costly ASIC designs. The authors indicate plans to extend CROSS to additional HE schemes and to explore integration with cloud service providers.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung