Publication | Closed Access
Does $\ell _{p}$ -Minimization Outperform $\ell _{1}$ -Minimization?
59
Citations
37
References
2017
Year
In many application areas ranging from bioinformatics to imaging, we are faced with the following question: can we recover a sparse vector x <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">o</sub> ∈ ℝ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">N</sup> from its undersampled set of noisy observations y ∈ ℝ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</sup> , y = Ax <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">o</sub> +w. The last decade has witnessed a surge of algorithms and theoretical results to address this question. One of the most popular schemes is the ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sub> -regularized least squares given by the following formulation:x̂(y, p) ∈ arg min <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">x</sub> (1/2)∥y - Ax∥ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> + γ∥x∥ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sub> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sup> , where p ∈ [0, 1]. Among these optimization problems, the case p = 1, also known as LASSO, is the best accepted in practice, for the following two reasons. First, thanks to the extensive studies performed in the fields of high-dimensional statistics and compressed sensing, we have a clear picture of LASSO's performance. Second, it is convex and efficient algorithms exist for finding its global minima. Unfortunately, neither of the above two properties hold for 0 ≤ p <; 1. However, they are still appealing because of the following folklores in the high-dimensional statistics. First, x̂(γ, p) is closer to x <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">o</sub> than x̂(γ, 1). Second, if we employ iterative methods that aim to converge to a local minima of arg min <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">x</sub> (1/2)∥y - Ax∥ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> + γ∥x∥ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sub> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sup> , then under good initialization, these algorithms converge to a solution that is still closer to x <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">o</sub> than x̂(γ, 1). In spite of the existence of plenty of empirical results that support these folklore theorems, the theoretical progress to establish them has been very limited. This paper aims to study the above-mentioned folklore theorems and establish their scope of validity. Starting with approximate message passing (AMP) algorithm as a heuristic method for solving ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sub> -regularized least squares, we study the following questions. First, what is the impact of initialization on the performance of the algorithm? Second, when does the algorithm recover the sparse signal x <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">o</sub> under a “good” initialization? Third, when does the algorithm converge to the sparse signal regardless of the initialization? Studying these questions will not only shed light on the second folklore theorem, but also lead us to the answer the first one, i.e., the performance of the global optima x̂(γ, p). For that purpose, we employ the replica analysis <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> to show the connection between the solution of AMP and x̂(γ, p) in the asymptotic settings. This enables us to compare the accuracy of x̂(γ, p) and x̂(γ, 1). In particular, we will present an accurate characterization of the phase transition and noise sensitivity of ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sub> -regularized least squares for every 0 ≤ p <; 1. Our results in the noiseless setting confirm that ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sub> -regularized least squares (if γ is tuned optimally) exhibits the same phase transition for every 0 ≤ p <; 1 and this phase transition is much better than that of LASSO. Furthermore, we show that in the noisy setting, there is a major difference between the performance of ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sub> -regularized least squares with different values of p. For instance, we will show that for very small and very large measurement noises, p = 0 and p = 1 outperform the other values of p, respectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1