[Notes] (SIGIR2022) CAUM: News Recommendation with Candidate-aware User Modeling

Haren Lin
4 min readMay 17, 2022

Paper Link

https://arxiv.org/pdf/2204.04726.pdf

Observation

Existing methods first model user interests and candidate news content separately and then use their representations for interest matching. In these methods, user interests are modeled in a candidate-agnostic way. However, each user usually has multiple interests, and it may be difficult to accurately match candidate news with a specific user interest if candidate news is not considered in user modeling.

簡單講:或許把 Candidate News 一起抓進來做 User Representation,會比直接做 Encode 效果來得好!

Existing Methods (Left: NRMS; Right: NPA)

Idea

審視這篇論文作者的論點:

  1. User usually has multiple interests. However, a candidate news usually only matches a small part of them. 從上圖可以看出 User 有多重興趣,包含政治、音樂、體育、旅遊。4th Candidate news 只匹配到 User 的政治興趣,與該用戶的其他興趣(如音樂和體育)相關性較低。因此,如果在用戶建模中不考慮候選新聞信息,則可能難以準確匹配候選新聞。
  2. Local contexts of users’ news click behaviors are useful for inferring short-term user interests. 從上圖 12th & 13th clicked news 之間的相關性推斷出用戶最近對威斯康辛州旅行的短期興趣。
  3. Long-range relatedness between users’ historical clicks also provides rich information to model long-term user interests. 從 5th & 10th clicked news之間的長期相關性推斷出用戶對音樂的長期興趣。

Thus, understanding both short- and long-term user interests is important for accurate news recommendation.

看到作者這句話,我第一直覺想到的就是 LSTUR 這篇論文提出的模型,稍微看了一下這篇 CAUM,其實他就是 LSTUR 的進化版!

Recap: LSTUR

Model Architecture

Candidate-Aware User Modeling 模型整體架構主要分成三個區塊:(1) Candi-SelfAtt Module (2) Candi-CNN Module (3) Candi-Attention Module

CAUM Model Architecture

Part 1. Candi-SelfAtt Module (Long Term Interest Modeling)

首先是 Candi-SelfAtt Module,新聞點擊的 Long-range context 通常可以為推斷 global user interests 提供重要的信息。此外,不同的遠程行為上下文對於捕捉不同的全球用戶興趣通常具有不同的重要性(所以需要 Attention)。

第一步,套用 Multi-Head Self-Attention (MHA) 在用戶的歷史點擊紀錄。我們一開始會得到每個 User 的 Clicked news representation [C1, C2, …, CN],接著我們會把每篇新聞都先經過一個 Linear transformation (Qu),來得到 Query Term / Query Matrix。那同時原先的 Clicked News 也會過一個 Linear transformation (Wr),得到 Temporary clicked news representation。接著就是把 Query 跟這個 Temporary clicked news representation 去做內積運算,得到第 i 篇歷史點擊新聞跟第 j 篇歷史點擊新聞的 Attention scores。

第二步,把 Candidate news (nc) 納入考量,過一個 Linear transformation (Qc),來得到 Query Term / Query Matrix (qc)。一樣把 Query 跟剛剛算的 Temporary clicked news representations 去做內積運算,得到這篇 Candidate news 跟第 j 篇歷史點擊新聞的 Attention scores,然後把這個 Attention scores 加上去剛剛第一步算出來的 Attention scores 得到最終的 Attention Scores。

第三步,根據前兩步算出來的 Attention Scores 來做 Attention Pooling,將結果再過一層的 Linear transformation (Wo),得到第 k 個 head 的 Output news representation (li_k)。

上述的東西我們都會分成很多 head 來做,最後會把 K 個 head 的 Output Representation 給 Concat 起來得到最終的 News representation, li, i from 1 to N → [l1, l2, l3, …, lN]。

說白了,這塊在做的事情就把除了 Clicked News 自身之外還把 Candidate News 抓進來當 Query 然後做 Attention 得到 News representation。

Part 2. Candi-CNN Module (Short Term Interest Modeling)

再來第二部分是 Candi-CNN Module,看到 CNN 運用在這很明顯可以知道他在抓的是 Local Contexts。但這邊他的手法比較特別的是會引入 Candidate News 來做。

第一步,設定 CNN Window Size = 2h+1,即對於第 i-th 篇的歷史點擊新聞,我們會一起看他左邊 h 篇新聞以及右邊 h 篇新聞,以及 Candidate news nc 來得到第 i-th 篇的 local contextual representation si。Wc 就是 CNN Filter 的 Weight,掃完整排的歷史典籍新聞後會得到 [s1, s2, …, sN]。

第二步,把第一個 Candi-SelfAtt Module 跟當前這個 Candi-CNN Module 的產出,分別為 li & si, i from 1 to N,兩者做結合。先把 si & li 做 Concat 再過 Linear transformation (Pm) 得到 mi,這個 mi 就是每篇歷史點擊新聞的 News representation。

Part 3. Candi-Attention Module

第三部分,在做的事情很簡單,就是把剛剛生成的每篇歷史點擊新聞的 News representation (mi),做 Attention Pooling 得到 User representation (u)。而 Attention Score 的計算是來自於這第 i-th 篇的歷史點擊新聞 mi 跟 Candidate news nc 他們之間的 Similarity (Φ is an MLP network)。

Part 4. News Encoder & Interest Matching

最後,透過 BPR Loss 的方法來更新模型參數,直到收斂。

Training Details: In CAUM, dimensions of both news and user interest representations are set to 400.Candi-SelfAtt contains 20 attention heads, and output vectors of each head are 20-dimensional. Candi-CNN contains 400 filters and window size is set to 3. We train CAUM 3 epochs via Adam with 5e-5 learning rate. All hyper-parameters of CAUM and other baseline methods are selected based on the validation dataset.

Experiments

Ablation Study

First, adding one of Candi-SelfAtt and Candi-CNN significantly improves the performance of Base. This result verifies that Candi-SelfAtt and Candi-CNN can effectively exploit behavior contexts to capture global and short-term user interests that are informative for matching the candidate news, respectively. Second, CAUM outperforms both Base+CandiCNN and Base+Candi-SelfAtt. This is because candidate-aware user interest modeling is important for both global and short-term user interest.

This article will be updated at any time! Thanks for your reading. If you like the content, please clip the “clap” button. You can also press the follow button to track new articles. Feel free to connect with me via LinkedIn or email.

--

--

Haren Lin

MSWE @ UC Irvine | MSCS @ NTU GINM | B.S. @ NCCU CS x B.A. @ NCCU ECON | ex-SWE intern @ TrendMicro