What does keras.layers.Embedding actually do?

6 min readJul 1, 2021

image source: https://unsplash.com/photos/w7ZyuGYNpRQ

到底 keras.layers.Embedding，實際上在做什麼事？有鑒於最近在看 Tensorflow Keras 的 Code 時，這一層很常出現在模型定義的程式碼中，不如來好好研究一下這個東西！

tf.keras.layers.Embedding(
    input_dim,
    output_dim,
    embeddings_initializer="uniform",
    embeddings_regularizer=None,
    activity_regularizer=None,
    embeddings_constraint=None,
    mask_zero=False,
    input_length=None,
    **kwargs
)

1. Turn the positive integers (indices) into dense vectors of fixed size.
2. This layer can only be used as the first layer in the model.

Embedding Layer，只會使用在自然語言模型的第一層，顧名思義將代表文字的 ID，轉成以指定長度的向量的表示。

傳統方法使用維度很大的向量來表示每個單詞，而每篇文章所構成的向量會形成一個很多〇組成的稀疏矩陣 Sparce Matrix，顯而易見這不是好的表示法來說明文字。Embedding Layer 說白了他就是對詞袋模型(Bag-of-Word Model)的表示法進行改進，每個單詞將改成由長度固定的向量來表示，而且這個向量的長度由你自己決定。而這個向量空間中，單詞的位置是從文本中學習的，像這樣子設計的例子包含 Word2Vec (CBOW + Skip-Gram)，還有 GloVe: Global Vectors for Word Representation。

我們該如何正確使用它呢？以下將介紹他有三個必須傳入的參數。

第一個是 input_dim：文本內詞彙的取值數 / 文本獨立的字的個數。假設經過斷詞整理後的獨立文字，數據編碼共有0–9，那總數就是10。
第二個是 output_dim：嵌入單詞的向量空間大小。他為每個單詞設定了輸出維度大小，例如 32 或 100，視為 Hyper-Parameter。
第三個是 input_length：雖然官方預設為None但建議我們手動設定。很多人會誤解，他其實不是要傳入的文字數據的筆數(那是Batch Size)，而是你要傳入的原始數據之統一長度，通常會是你設定的 Max_Seq_Len。
一般而言，我自己的記憶方式為：

Embedding(input_dim=vocabulary_size, output_dim=embedding_dimension, input_length=max_seq_len)

以宏觀的角度來看，輸入的東西為 2D tensor with shape: (batch_size, input_length)，而輸出的東西是 3D tensor with shape: (batch_size, input_length, output_dim)。因此，對每一筆輸出資料來說，Embedding Layer 的輸出會是二維的向量，如果後面要接 Dense 的話，必須先用 Flatten 展開。

其他參數：（一般情況下幾乎不會使用）

embeddings_initializer: 運用於 embeddings 矩陣的初始化器 (see keras.initializers)
embeddings_regularizer: 對於 embeddings 矩陣的 Regularizer (see keras.regularizers)
activity_regularizer: 對於輸出矩陣的 Regularizer (see keras.regularizers)
embeddings_constraint: 對於 embeddings 矩陣的限制函數 (see keras.constraints)
mask_zero: Boolean, 0 是否為特定的 padding 值。在使用 RNN 時非常有用，因為 recurrent layers 的輸入為可變長度。如果該值為 True 那麼之後模型中的所有層需要支持 masking 否則會觸發 exception。此外，若被設置為 True，那麼詞彙表不能使用索引值 0 (input_dim 的值應該等於詞彙表大小+1)。簡單講，這個東西在使用 RNN 的時候可能會用到，他來協助解決 padding 干擾到原始數據的問題。

以下為範例程式，可以幫助理解 Embedding Layer 的功能。

如果我想引入已經 Pre-Trained 好的 GloVe 模型作為我的 Embedding，該怎麼做？下方以 glove/glove.6B.50d.txt 作為我們預訓練好的 Embedding 範例。特別注意的是可以加入 weights 參數，表達我們自定義好的 Embedding Representation，同時，要把 Trainable 設定成 False。

GitHub for your reference:

harenlin/Simple-Text-Generation

Contribute to harenlin/Simple-Text-Generation development by creating an account on GitHub.

github.com

Official Document of Embedding Layer:

Keras documentation: Embedding layer

Turns positive integers (indexes) into dense vectors of fixed size. e.g. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]] This…

keras.io

This article will be updated at any time! Thanks for your reading. If you like the content, please click the “clap” button. You can also press the follow button to track new articles at any time. Feel free to contact me via LinkedIn or email.

What does keras.layers.Embedding actually do?

harenlin/Simple-Text-Generation

Contribute to harenlin/Simple-Text-Generation development by creating an account on GitHub.

Keras documentation: Embedding layer

Turns positive integers (indexes) into dense vectors of fixed size. e.g. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]] This…

Written by Haren Lin

No responses yet