The line vectors of the matrix $\XA$ and $\Xa$ and the components of $\yA$ and $\ya$ are distinguished by the range of their index for ease of notation.
$i$ will range over $\{1, \cdots, n_2 \}$ and $j, a, b$ over $\{1, \cdots, n_1\}$ which makes no sense but I don't want to change it right now.
\subsection{Ridge regression}
Consider the Mahalanobis distance $d(x,y)^2 = \Vert A(x-y) \Vert_2^2$.
We define the first kernel matrix $K \in \R^{n_1 \times n_1}$ such that $K_{j,j'} = \kernel{j}{j'}$.
Set the ridge regression coefficients $\alpha \in \R^{n_1}$ by $\alpha = (K + \lambda I)^{-1} \ya$.
\begin{remark}
If $K$ is singular, there are many choices of $\alpha$ and this one might not be optimal. However the exact derivation of the gradient needs a fixed formula and I am scared of SVD.
\end{remark}
From $\alpha$, we define $\yh_i = \sum_j \alpha_j k_{ij}$, where $k_{ij} = \kernel{i}{j}$.
\subsection{Cost function}
Let the cost function $\mathcal{L} = \sum_i (\yh_i - y_i)^2$ which implicitly depends on $A$.
We aim to compute $\dA{\mathcal{L}}$ which is a matrix verifying $(\dA{\mathcal{L}})_{ij} = \frac{\partial \mathcal{L}}{\partial A_{ij}}$.
Using the result from section \ref{ssec:1}, we find that $\dA{K_{ab}} = -\frac{2 k_{ab}}{\sigma^2} A x_{ab} x_{ab}^T$. Note that both $a$ and $b$ range over $\{ 1, \cdots, n_1\}$, as they index the first kernel $K$. Hence
\begin{equation}
\dA{}(\Hmo)_{ji} = \frac{2}{\sigma^2} A \sum_{a,b} (\Hmo)_{ja} (\Hmo)_{bi} k_{ab} x_{ab}x_{ab}^T,
\end{equation}
and
\begin{align*}
\dA{\alpha_j} &= \sum_i \dA{}(\Hmo)_{ji} y_i, \\
&= \frac{2}{\sigma^2} A \sum_{i,a,b} (\Hmo)_{ja} (\Hmo)_{bi} k_{ab} x_{ab}x_{ab}^T y_i, \\
Consider two matrices $A$ and $B$ having the same shape as $\Xa$ and $\XA$ of the introduction, and $x_{ij} = a_i - b_j$ ($i$-th line of $A$ minus $j$-th line of $B$). Set further the matrix
\begin{equation}
\Sigma = \sum_{i,j} W_{ij} x_{ij}x_{ij}^T,
\end{equation}
with some coefficients $W_{ij}$ making up a matrix $W$. Then
\begin{equation}
\Sigma = -A^T W B^T - B^T W^T A + A^T R A + B^T S B,
\end{equation}
where $R$ and $S$ are both diagonal matrices with $R_{ii} = \sum_j W_{ij}$, and $S_{jj} = \sum_i W_{ij}$.