One-sided Jacobi Algorithm

Outline of the Algorithm

The Jacobi algorithm for computing the singular value decomposition (SVD) of a general matrix $A \in \mathbb{R}^{m \times n}$ was proposed by Hestenes. The algorithm proceeds as follows:

\begin{equation}\notag
A^{(0)} = A, \quad A^{(k+1)} = A^{(k)} J^{(k)}, \quad A^{(\infty)} = U \varSigma,
\end{equation}

for $k = 1, 2, \dots$, where $J^{(k)}$ represents the Jacobi rotation applied at the $k$-th step to orthogonalize two distinct columns of $A^{(k)}$. Each $J^{(k)}$ is determined by the pivot position $(p, q) = (p(k), q(k))$, which depends on the chosen pivot strategy, such as classical or cyclic.

Given a pivot $(p, q)$, we first form the $(p, q)$ submatrix of $(A^{(k)})^T A^{(k)}$:

\begin{equation}\notag
(A^{(k)})^T A^{(k)} ([p, q], [p, q]) =
\begin{bmatrix}
a & c \\ c & b
\end{bmatrix},
\end{equation}

where

\begin{equation}
\label{eq:abc} \tag{1}
a = (a^{(k)}_p)^T a^{(k)}_p, \quad
b = (a^{(k)}_q)^T a^{(k)}_q, \quad
c = (a^{(k)}_p)^T a^{(k)}_q.
\end{equation}

The remaining steps mirror the procedure for the two-sided Jacobi algorithm:

\begin{equation}\notag
\xi = \frac{b – a}{2c}, \quad
t = \frac{\text{sign}(\xi)}{|\xi| + \sqrt{1 + \xi^2}}, \quad
cs = \frac{1}{1 + t^2}, \quad
sn = cs \times t,
\end{equation}

and

\begin{equation}\notag
V^{(k)} =
\begin{bmatrix}
cs & sn \\ -sn & cs
\end{bmatrix}.
\end{equation}

Suppose the algorithm converges after $m$ steps, yielding $A^{(m+1)}$. The column norms of $A^{(m+1)}$ correspond to the singular values of $A$, and $U = A^{(m+1)} \varSigma^{-1}$ represents the left singular vector matrix. The right singular vector matrix $V$ can be obtained either by accumulating $V = J^{(0)} \cdots J^{(m)}$ or by solving the matrix equation $AV = A^{(m)}$ for $V$, as described in this paper. Thus, $A = U \varSigma V^T$ constitutes the SVD of $A$.

Preconditioning

Zlatko Drmač and Krešimir Veselić published a paper focusing on pre-processing the matrix to accelerate the convergence of the one-sided Jacobi algorithm. Their approach includes:

Choosing between $A$ and $A^T$ for square matrices: Let $A = DQ$, where $D$ is diagonal and $Q$ is orthogonal. Applying the one-sided Jacobi algorithm to $A$ implicitly diagonalizes $A^T A = Q^T D^2 Q$. However, applying it to $A^T$ diagonalizes $AA^T = D^2$, which is already diagonal. Therefore, one should choose $A^{(0)}$ to maximize

\begin{equation}\notag
\| \text{diag}((A^{(0)})^T A^{(0)}) \|_2
\end{equation}

or minimize the diagonal entropy of $(A^{(0)})^T A^{(0)}$.

Using QR factorization with rank-revealing column pivoting: Set $A^{(0)} = R$. This redistributes the mass of the diagonals of $R^T R$, positively impacting convergence. Additionally, for a rectangular $m \times n$ matrix, a rank-revealing algorithm can reduce the problem to $n \times n$.
Setting $A^{(0)} = R^T$: This implicitly applies Rutishauser’s LR diagonalization to $R^T R \to R R^T$, which has a nontrivial diagonalizing effect.

Superior Accuracy

While the one-sided Jacobi algorithm typically struggles to match the performance of bidiagonalization methods like QR iteration and divide-and-conquer, it offers distinct advantages in accuracy for certain matrices. Perturbation theory for bidiagonalization methods shows that:

\begin{equation}\notag
\frac{|\sigma_i – \hat{\sigma}_i|}{\sigma_i} \le O(\varepsilon) \kappa(A),
\end{equation}

where $\sigma_i$ and $\hat{\sigma}_i$ are the singular values of $A$ and $A + \varDelta A$, respectively. Here, $\varDelta A$ is a small perturbation arising during the algorithm’s application, satisfying

\begin{equation}\notag
\| \varDelta A \|_2 \le O(\varepsilon) \| A \|_2,
\end{equation}

and $\kappa(A) = \sigma_{\max}(A) / \sigma_{\min}(A)$ is the 2-norm condition number of $A$. This inequality implies that small singular values may be highly inaccurate if $\kappa(A)$ is large.

However, the one-sided Jacobi algorithm improves this bound. Let $A = BD$, where $B$ has unit 2-norm and

\begin{equation}\notag
D = \text{diag}(\| a_i \|_2).
\end{equation}

Using the notations in \eqref{eq:abc}, James Demmel and Krešimir Veselić proved the bound:

\begin{equation}\notag
\frac{|\sigma_i – \hat{\sigma}_i|}{\sigma_i} \le O(\varepsilon) \kappa(B),
\end{equation}

provided the algorithm terminates only if

\begin{equation}\notag
|c| \le \text{tol} \times (ab),
\end{equation}

where $\text{tol} > 0$ is a convergence parameter. For strongly scaled matrices $A$, we expect that $\kappa(B) \ll \kappa(A)$.

Implementation

The one-sided Jacobi algorithm has been implemented in LAPACK.

References

Dongarra, Gates, Haidar, Kurzak, Luszczek, Tomov & Yamazaki, The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale, SIAM Rev. 60, 808–865, 2018
Drmač & Veselić, New Fast and Accurate Jacobi SVD Algorithm. I, SIAM J. Matrix Anal. Appl. 29, 1322–1342, 2008

Outline of the Algorithm

Preconditioning

Superior Accuracy

Implementation

References

Leave a Reply Cancel reply