[paper-review] Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

ICML 2011. [Paper] [Github]

Salah Rifai¹, Pascal Vincent¹, Xavier Muller¹, Xavier Glorot¹, Yoshua Bengio¹ > ¹ Dept. IRO, Universit´e de Montr´eal. Montr´eal (QC), H3C 3J7, Canada

Jun. 11

한 문장 요약

Explicit Penalty term을 추가하여 robust feature extractor를 제안한다.

Unlabeled data를 활용하여 data의 general한 representation을 학습하는 비지도 학습 방법론은 NLP분야에서 큰 발전을 이끌었으며 컴퓨터비전에서도 비지도학습 중 한 분야인 자가지도학습에서 Pretext Task와 Contrastive Learning 방법론이 다양한 Task에서 SOTA 성능을 보여주고 있다.

Previous Approaches

Basic auto-encoder (AE)

Encoder:
\[\begin{equation} h = f(x) = s_{f}(Wx+b_{x}) \end{equation}\]
\(s_{f}\): nonlinear activation function (e.g., \(\text{sigmoid}=\frac{1}{1+\exp^{-x}}\))
\(W\): encoder is parameterized with \(d_{h}\times d_{x}\)
\(b_{h} \in \R^{d_{h}}\): basis vector.
Decoder:
\[\begin{equation} y = g(h) = s_{g} (W' h+b_{y}) \end{equation}\]
\(s_g\): decoder’s activation function.

Auto-encoder의 training objective는 reconstruction error를 최소화하는 \(\theta=\{W,b_h,b_y\}\) parameter 값을 찾는 것이다.

\[\begin{equation} \mathcal{J}_{\text{AE}}(\theta) = \sum_{x\in D_{n}} L(x, g(f(x))) \end{equation}\]

Regularized Auto-encoders (AE+\(\text{wd}\))

\[\begin{equation} \mathcal{J}_{\text{AE+wd}}(\theta) = (\sum_{x\in D_{n}} L(x, g(f(x)))) + \lambda \sum_{ij} W^{2}_{ij} \end{equation}\]

Denoising Auto-encoders (DAE)

\[\begin{equation} \mathcal{J}_{\text{DAE}}(\theta) = \sum_{x\in D_{n}} \mathbb{E}_{\tilde{x}\sim q(\tilde{x}\mid x)} [L(x, g(f(\tilde{x})))] \end{equation}\]

Methodology:

\[\begin{equation} \mid \mid J_{f}(x) \mid \mid^{2}_{F} = \sum_{ij} \left( \frac{\partial h_{j}(x)}{\partial x_{i}}\right)^{2} \end{equation}\]

\(x \in \R^{d_{x}}\): Input is mapped by encoding function \(f\) to hidden representation \(h\), \(h \in \R^{d_{h}}\)

이 sensitivity penalization term은 feature space에서 학습 데이터간에 contractive 하게 만들어준다. 즉, 조금 더 representation이 좋아지게 하려는 의도인 것이다.

그렇게 Objective loss function은 아래와 같이 전개할 수 있다.

\[\begin{equation} \mathcal{J}_{\text{CAE}}(\theta) = \sum_{x\in D_{n}} (L(x, g(f(x))) + \lambda \mid\mid J_{f}(x) \mid\mid^{2}_{F}) \end{equation}\]

Conclusion:

DAE 에서는 Encoder가 입력 데이터의 작은 변화 혹은 노이즈에 저항하는 것에 목적을 두었으며, CAE 에서는 Decoder에서 reconstruction을 할 때에 중요하지 않은 입력의 변화를 무시하는 것에 목적을 둔다. 저자는 이러한 효과를 Encoder의 Jacobian matrix에 penalty function을 추가하며 구현했다.

Thoughts:

Auto-encoder에 대한 연구 흐름을 잘 요약해주는 연구인 것 같다.
Encoder의 Jacobian matrix에 Penalty, 즉 이 행렬을 최소화 하는 것이 명확하게 어떠한 물리적인 의미로 Contractive 한 측면에서 도움이 되는지는 잘 이해가 되지 않았다.
Point cloud feature extraction 관련된 논문을 찾다가 읽게 되었으며, 구현된 코드를 돌려보며 얼마나 잘 represent 되는지 확인해봐야 할 것 같다.