[Notes] (SIGIR2021) DUMN: Deep User Match Network for Click-Through Rate Prediction

Haren Lin
6 min readJun 23, 2022

--

Paper Link

Source Code

Observation & Idea

  1. Recently deep learning based models have been proposed and successfully applied for CTR prediction by focusing on feature interaction or user interest based on the item-to-item relevance between user behaviors and candidate item.
  2. However, these existing models neglect the user-to-user relevance between the target user and those who like the candidate item, which can reflect the preference of target user.

DUMN: matching the target user and those who like the candidate item (i.e. who have interacted with candidate item in our work) to measure the user-to-user relevance.

  • Embedding Layer. (for user behaviors and profile features)
  • User Representation Layer. Learn a unified user representation which contains user latent interest based on user behaviors and profile.
  • User Match Layer. Measure the user-to-user relevance by matching the target user and those who have interacted with candidate item and modeling their similarities in user representation space. With the user-to-user similarity relevance, generate the matching representation to reflect the preference of target user.
  • CTR Prediction Layer. The matching representation and user representation of the target user, the user-to-user similarity relevance, and the embedding vectors of the candidate item and context are concatenated and fed into several FC layers for outputting predicted CTR.

Main Contribution

  1. Highlighted the importance of capturing the user-to-user relevance between target user and those who like the candidate item.
  2. Designed two network architectures to learn a unified user representation and model the user-to-user relevance with the similarities based on user representations, respectively.

Model Architecture

Part1. Notations

  1. User Behavior: Given a user u, the user behavior Iu is the sequential list of the interacted items with corresponding features such as item id, category, etc. (使用者過去的點擊紀錄,裡面是每個 Item,共有 Nu 個 Items)
ik is the k-th interacted item, Nu is the length of Iu.

2. Item Behavior: Given an item m, the item behavior Um is the sequential list of users who interact with m with corresponding profile features such as user id, age, etc., as well as their user behaviors when interacting with m. (Item的歷史紀錄,包含 user id 以及他的 User behavior list 的 Pair,一共有 L 個 User Pairs 的記錄)

uk is the k-th interacted user, Iu is the user behavior of uk, L is the length of Um

CTR 預測的目標是在給定 User behavior Iu, Item behavior Um, User features u, Item features m, and Context c 的情況下,用 Model F 估計這個 User u 點擊 Candidate item m 的機率 p

Part2. Embedding Layer

Transform the input features of DUMN to embedding vectors.

  1. Encode each feature into a one-hot vector with high-dimensional sparse binary encoding.
  2. The one-hot vector then are transformed into low-dimensional dense vector by the embedding layer.

我們手上有的資料包含:User features u, Item features m, and Context c, User behavior Iu, Item behavior Um.

  • Profile of user features u → eu.
  • Candidate item features m → em.
  • Context c → ec.
  • Item (historical) behavior Um → Zm = [(eu1, Xu1), (eu2, Xu2), …, (euL, XuL)]. (L is the length of item behaviors)
  • User (historical) behavior = User Interacted Items Iu → Xu = [ei1, ei2, …, eiNu]. (Nu is the length of user behaviors)

After embedding layer, we get:

Part3. User Representation Layer (i2i)

User Representation Layer in DUMN
  • With obtained embedding vectors of the user behavior and profile via Embedding Layer, learn the unified user representation in User Representation Layer by combining user latent interest and profile features.
  • Leverage an attention mechanism to measure the item-to-item relevance based on item embedding representations and obtain the user interest representation according to the relevance.

透過 Candidate item embedding (em),去跟 User Behaviors 裡面的每個 Item 去計算 Attention Scores 得到 γk (k from 1 to Nu),把 γk 都過 Softmax 得到 Attention Weights μk。

eik is the embedding vector of the k-th interacted item in the user behavior of u, γk is the item-to-item relevance between the k-th interacted item and candidate item, μk is the attention weight, and σ is the sigmoid function.

利用 Attention Weights (μk) 與 User Behaviors 中每個點擊過的 Item embedding (eik) 做 Weighted Sum,得到 User interest representation (ru_hat)。

接著,把 User interest representation (ru_hat) 跟 Target user u’s profile embedding (eu) 做 Concat,得到最後得到 (Candidate-Aware) User representation (ru)

P.S. 這個 ru 會有 ru1, ru2, …, ruL,共 L 個。因為 Item Behaviors 長度為 L,共有 L 個 Users 跟他對應的 User Behaviors。

Part4. User Match Layer (u2u)

個人覺得這個部分某種程度上隱含 User-Based Collaborative Filtering (UserCF) 的概念。

DUMN Main Architecture
  • User Match Layer aims at measuring the user-to-user relevance by matching target user and users in the item behavior of candidate item and modeling their similarities with a Relevance Unit.
  • With the user-to-user similarity relevance, User Match Layer also generates a matching representation to reflect the preference of target user.

Part4–1. Relevance Unit
- Input: representation of user a and user b. (ruk & ru)
- Output: similarity of user a and user b.

透過 Relevance Unit 我們可以得到 Target user u 以及 Candidate Item’s item behavior 中每個 User uk 的 user-to-user similarity relevance αk。

Part4–2. Matching Representation for target user u

根據 u2u relevance score (αk) 與 User representation (ruk) 做 Weighted Sum,得到 Matching Representation for target user u (Su)

Part4–3. Total User-to-user Similarity Relevance

User-to-user Similarity Relevance αk, k from 1 to L 總和為 Ru。

Part5. Output Layer

  • Output Layer targets at calculating the predicted probability that the target user u clicks the candidate item m.

[Step1] 把 matching representation (Su) and user representation of target user (ru), total user-to-user relevance (Ru), and the embedding vectors of the candidate item and context (em & ec),全部 Concat 起來變成一條高維度的向量。

[Step2] 把他傳入兩層的 FC Layers 且使用 PReLU activation function。

[Step3] 把 Last hidden layer 抓出來過 Linear Transform 後經過 Sigmoid 得到 Predicted CTR p

P.S. Training with NLL loss

NLL Loss

Experiments

Datasets: Amazon product reviews and metadata → Sports, Beauty, Grocery Datasets。

In three datasets, the product reviews consist of the information of the user, item, rating and timestamp. 85% for train, 15% for test.

[Step1] Treat all existing rating reviews as positive interactions in which the click label is 1.

[Step2] Filter out users and items that have less than 5 reviews.

[Step3] Sort the reviews by the timestamp to build the user behaviors and item behaviors.

[Step4] Negative Sampling: Build the negative samples by replacing the item in each review with another item randomly selected from the non-clicked item set.

Datasets Statistics
Performance comparison on three datasets. The percentage in the last row of DUMN is the relative improvement compared to the best baseline with t-test at p-value of 0.05.
Influence of the maximum length Lmax of item behavior on DUMN in the training phase on Beauty dataset.

This article will be updated at any time! Thanks for your reading. If you like the content, please clip the “clap” button. You can also press the follow button to track new articles. Feel free to connect with me via LinkedIn or email.

--

--

Haren Lin

MSWE @ UC Irvine | MSCS @ NTU GINM | B.S. @ NCCU CS x B.A. @ NCCU ECON | ex-SWE intern @ TrendMicro