Cross-User KV Sharing Cuts Cache Size to 0.8% in Sequential Recommendation Models
Global: Cross-User KV Sharing Cuts Cache Size to 0.8% in Sequential Recommendation Models A new method named CollectiveKV has been introduced to address latency and storage concerns in sequential recommendation…