¹College of Science, Department of Mathematics, University of Sulaimani, Sulaymaniyah, 46001, Kurdistan Region of Iraq, ² College of Science, Department of Mathematics, University of Kirkuk, Kirkuk, 36013, Iraq.
Received: 03-07-2025 Accepted: 16-08-2025 Published: 27-09-2025
ABSTRACT
This paper presents a new hybrid conjugate gradient (CG) method for solving large-scale unconstrained optimization problems where classical CG algorithms may not perform well. This new method integrates four classical algorithms, namely, Liu and Storey, Fletcher and Reeves, Dai and Yuan, and Polak, Ribiere, and Polyak, using a convex combination of CGs and an inexact line search based on strong Wolfe conditions, and adheres to the Dai-Liao conjugate condition to enhance convergence properties. The theoretical study established the conditions for sufficient descent and global convergence of the hybrid CG method. Numerical studies demonstrated that the proposed method significantly reduced the number of iterations and computational time, achieving superior evaluation efficiency compared with other CG methods. In particular, the effectiveness of the proposed algorithm is validated for image impulse denoising, where it can recover high-quality images while preserving significant features, implying its practical applicability to real-world signal and image processing problems.
Index Terms: Conjugate Gradient Method, Strong Wolfe Condition, Global Convergence, Unconstrained Optimization, Convex Combination, Hybrid Method, Image Processing
Conjugate gradient (CG) methods are a category of unconstrained optimization algorithms that are distinguished by their low memory requirements and robust local and global convergence characteristics [1]. The main objective of the CG method was to solve a linear system, Hestenes and Stiefel [2]. Later, the method was modified to address an unconstrained optimization problem because solving a linear system is equivalent to minimizing a positive definite quadratic function [3]. CG methods have significantly contributed to solving large-scale non-linear optimization problems. The CG method is often used to solve the following unconstrained optimization problem:
where f :ℝn → ℝ is a smooth non-linear function with an available gradient.
The iterative formula of the CG method is expressed as follows,
and
In this context, x0 is the initial point, and xk represents the present iteration. The gradient of f at point xk is given by gk = ∇f(xk). The search direction is denoted by dk., and the conjugate (update) parameter. βk∈ℝ determines various iterations of the CG method. In addition, the step size, which can be achieved using numerous line search methods, is represented by a positive value αk>0.
In this study, we use an inexact line search called the strong Wolfe line search, which has garnered significant focus in convergence analysis and the execution of CG methods and is characterized by the following requirements [4,5].
where δ, σ are scalars and satisfy 0<δ≤σ<0.5.
Assume that ‖-.‖- is Euclideannorm and yk = gk+1−gk. If f is a strongly convex quadratic, then theoretically, all six options for the update parameter in Table 1 are equal to an exact line search [12]. However, they exhibit distinct behavior for general non-quadratic functions with inexact line search. Various CG methods correspond to different values based on the scalar parameter βk. The methods can be classified into two groups.
TABLE 1: Choices of βk in standard conjugate gradient methods.
In the first group, all CG parameters have the same numerator gTk+1yk. The methods have been proposed by Hestenes and Stiefel (HS), Polak, Ribiere, and Polyak (PRP), and Liu and Storey (LS) using the following CG parameters [1].
In the second group, ‖gk+1‖2 is the common numerator of all the CG parameters. The following CG parameters were proposed by Fletcher and Reeves (FR), Fletcher (conjugate descent or CD), and Dai and Yuan (DY), and include the following CG parameters [1].
Each CG method has distinct advantages and disadvantages. The second group of FR, CD, and DY methods with ‖gk+1‖2 in the numerator has significantly worse numerical performance than the first group of HS, PRP, and LS algorithms [13]. The poor numerical performance of the second group can be attributed to the jamming phenomenon, whereby the algorithm may execute several brief iterations without achieving much advancement toward the minimum [13]. Nonetheless, the first group HS, PRP, and LS with gkT+1ykin the numerator inherently has an automated approximation restart mechanism that prevents jamming. Specifically, when the step αk is small, the component yk in the numerator approaches zero. Consequently, βk decreases, and the new search direction dk+1 approximates the steepest descent direction. However, Powell [14] developed a three-dimensional counterexample demonstrating that the PRP and HS methods may cycle indefinitely without a solution, indicating their lack of global convergence under specific conditions.
The FR, CD, and DY methods generally exhibit strong global convergence features [15], although their computational performance may be limited [13]. Simultaneously, the HS, PRP, and LS methods can fail to consistently converge, even though they often exhibit superior computing efficiency [13]. In Hager and Zhang [1] asserts that a comprehensive review of several CG approaches was presented.
The most important category of CG algorithms is hybrid CG algorithms. The goal of this study was to improve the computing speed and maintain robust global convergence by combining four types of CG methods. Hybrid CG methods include switching a parameter in the second group to the first group when iterations are stuck [16]. Recently, some hybrid CG methods have been proposed. Andrei proposed the following hybrid CG method: βhybk=(1-θk)βHSk+θkβDYk [17]. Djordjević proposed following hybrid CG methods: βhybk=(1-θk)βPRPk+(θkβFRk+βhybk=(1-θk)βLSk+θkβFRk [18,19]. Babando et al. proposed this hybrid method: βhybk=(1-θk)βPRPk+θkβDYk[3]. Liu and Li introduced a new hybrid approach that combines the two methods: βhybk=(1-θk)βLSk+θkβDY [20]. Hanachi et al. presented a hybrid approach that is a combination of three CG methods, as described in [21] and [22].
Inspired by the research of Hanachi et al. [21] and [22], we propose a new hybrid CG method that combines the LS, FR, DY, and PRP methods to solve problem (1). This study represents the first research effort to propose and analyze a hybrid CG method that utilizes a convex combination of four classical CG methods. To the best of our knowledge, no previous study has explored such a combination for developing of CG algorithms. The remainder of this paper is organized as follows: In section 2, we introduce the hybrid CG approach and demonstrate how we obtained the parameters ζk,ξk and ωk using various methods. Under moderate circumstances, we show that the chosen method, when used with a strong Wolfe line search, provides directions that satisfy the sufficient descent condition. The algorithm is described in section 3. In section 4, we examine the convergence properties of the proposed method. Through comprehensive numerical comparisons with several existing hybrid methods in the literature using 170 distinct test problems, demonstrating the improved efficiency of the proposed method in section 5, was also validated for an application to image impulse denoising in section 6. Section 7 concludes with a brief summary.
In this section, to enhance the performance of the CG updating parameter proposed by Hestenes and Stiefel [2], Fletcher and Reeves [6], Dai and Yuan [11], Polak and Ribiere [7] and Polyak [8], CG methods, we employed their convex combination, in which the parameter βk in the proposed method, referred to as βhRHk, i.e.,
The parameters ζk,ξk,ωk in equation (6) that meet the condition 0≤ζk, ξk, ωk≤1 are referred to as hybridization parameters. These parameters are obtained in a specific manner, which will be discussed in detail later and the search direction dk is computed from
To determine the step length αk, we utilize the strong Wolfe line search (4), (5).
There are 15 separate cases depending on the parameter values where 0≤ζk, ξk, ωk≤1 and ζk+ξk+ωk≤1.
It is clear from equations (6) and (7) that
For both numerical computation and convergence analysis, the traditional conjugacy condition is crucial in the CG method. This condition is based on an exact line search to obtain the step size αk; however, in practical computations, where an inexact line search is commonly utilized, the condition may not be strictly satisfied.
In 1978, Perry [25] generalized the conjugacy condition and incorporated ideas from quasi-Newton methods. In particular, Perry supplemented the conjugacy condition by adding modifications to the standard conjugacy condition. The secant equation was incorporated into this formulation. Perry’s extension enables the CG method to be brought in line with the quasi-Newton direction, is flexible, and is more robust under an exact line search.
However, it is still contingent on the exact line search and is therefore not particularly effective in certain numerical optimization procedures. In addition, inexact line searches rather than exact line searches are used to determine the step size αk.
Therefore, in 2001, Dai and Liao [26] suggested a less stringent conjugacy condition called the D-L conjugacy condition:
Where sk = xk+1−xk, and t is a non-negative modulating parameter. The selection of an optimal value for t remains an active area of research, with several potential choices proposed in [27-35]. Notably, for t=0, equation (11) simplifies to equation (9), while for t=1, it corresponds to equation (10). To select parameter t, this study employs as specified in [30]. The parameters ζk, ξk, and ωk are chosen by applying the D-L conjugacy condition, as given in equation (11). Taking the inner product of equation (3) with the vector ykT and calculating βk using equation (6) yields
By enforcing the D-L condition dTk+1yk=-tgTk+1sk , we obtain
To simplify the presentation, we define the following terms:
Rearranging the equation to solve for the hybridization parameters yields
This is an underdetermined linear equation in ζk,ξk,ωk. For our algorithm, we propose a strategy in which we compute one parameter based on the values of the other two (which can be set to defaults or previous values). This yields the following update rule:
Note that while these formulas appear simpler, substituting the full expressions for ALS, AFR, ADY, and APRP would recover more detailed forms of the equations. However, this formulation makes the underlying structure of the linear system explicit. In our algorithm, care is taken to handle cases in which any denominator is close to zero.
The parameters ζk,ξk, and ωk. may lie beyond the interval [0, 1]. The following rule must be applied to obtain a valid convex combination in equation (6): If ζk,ξk,ωk≤0, then set ζk,ξk,ωk=0 in (6). If ζk,ξk,ωk≥1, then we assign ζk,ξk,ωk=1 in (6). If ζk+ξk+ωk≥1, then we change ζk,ξk,ωk=1 in (6).
Algorithm 1: hybrid Rega and Hawraz Algorithm
Require: Initial point x0∈Rn, parameters δ,σ with 0 <δ≤σ<0.5, tolerance: ϵ = 10−6
Initialization: Set k = 0, compute f(x0), g0 =∇f(x0). If ||g0 || < ϵ, stop.
Set: d0 = −g0.
while ||gk||≥ϵ do
Line Search: Find step size αk > 0 satisfying strong Wolfe conditions (4) and (5).
Update variables:
xk+1 = xk+αkdk; g{k+1} = ∇f(xk+1)
sk = xk+1−xk; yk = g{k+1}−gk.
Compute hybridization parameters:
Compute ALS, AFR, ADY, APRP using equations (12)-(15).
Compute initial values for ζk,ξk,ωk using equations (16)-(18).
If any denominator is near zero, the corresponding parameter is set to zero.
Enforce convex combination constraints: Clamp each parameter to the interval [0, 1]:
ζk← max(0, min(1, ζk))
ξk← max(0, min(1, ξk))
ωk← max(0, min(1, ωk))
if ζk+ξk+ωk > 1 then
Normalize the sum to 1:
end if
Compute conjugate parameter βhRHk
Calculate
Update search direction: Set dk+1=-gk+1+βhRHkdk
Restart condition (Powell):
end if
Iteration Update: Set k← k+1
end while
3. SUFFICIENT DESCENT CONDITION
In this section, we demonstrate the sufficient descent of the proposed method, which is crucial for examining global convergence. These fundamental assumptions regarding the objective function are required to demonstrate the global convergence of the hRH method.
Assumption 1. The level set is bounded, i.e., ∃ a constant B>0 such that
‖-x‖-≤B,∀x∈L(19)
Assumption 2. In a neighborhood N of L, the function f is continuously differentiable, and its gradient ∇f(x) is Lipschitz continuous, that is, ∃ a constant 0<L<∞ such that:
‖-∇f(x)-∇f(y)‖-≤L‖-x−y‖-,∀x,y∈N(20)
Under Assumptions (1) and (2) on f, ∃ a constant c such that
Theorem 3.1. If the relations (6) and (7) are valid, then
Proof. Now, from (7), we have
Given in the equation (6), the final form is given by
can write
It follows that
Here, we obtain
Thus, the theorem has been proven.
Theorem 3.2. Let the sequences {gk} and {dk} be generated by the hRH method. The search direction dk satisfies the sufficient descent condition.
Proof. According to the hRH method, when the Powell restart criterion is satisfied, that is, ∣gTk+1gk∣≥0.2‖gk+1‖2 it follows that dk+1=−gk+1 and (23) is also valid. We proceeded with the assumption that the Powell restart criterion was invalid. Next, we obtain
We demonstrate the proof by mathematical induction for k=0,d0=-g0 so gT0d0=-‖g0‖2 We found that the sufficient descent condition is satisfied for k=0. We now assume that (23) is hold for k, thus
We now show that this is true for k+1. From the strong Wolfe condition (5),
Multiplying (22) by gTk+1 from the left yields
It is necessary to prove 15 cases depending on the parameter values:
Case 1: If ζk=1, ξk=0 and ωk=0, then the relation (27) becomes
The search direction for the LS method is given by
Taking the inner product of (28) with gTk+1
Using the definition yk = gk+1−gk, we can expand the term gTk+1yk:
By taking the absolute value and applying the triangle inequality, we obtain
From the second strong Wolfe condition (5) and dk is a descent direction (25), which gives
Substituting (29) into our main expression, we obtain
Since we are assumed that the Powell restart condition is not met, we have from (24) that ∣gTk+1gk∣⩽0.2‖gk+1‖2. Substituting into (30):
Let c1 = (1−1.2σ). Since σ<0.5, we have c1>0. Therefore:
Therefore, the search direction dLSk+1 fulfills the sufficient descent condition.
Case 2: If ζk=0, ξk=1 and ωk=0, then the relation (27) becomes
A sufficient descent condition for the FR method under strong Wolfe conditions was established in [1]. Thus, ∃ a constant c2 > 0 such that
Case 3: If ζk=0, ξk=0 and ωk=1, then the relation (27) becomes
Thus, it has been shown that dDYk+1 fulfills the sufficient descent condition.
Case 4: If ζk=0, ξk=0 and ωk=0 then the relation (27) becomes
We demonstrate the sufficient descent condition of the PRP method by the following theorem.
Theorem 3.3. [19] Under Assumptions (1) and (2), suppose that the step sizes αk satisfy conditions (4) and (5) with σ<½. If ‖sk‖→0 and there exist non-negative constants τ1 and τ2 such that
where sk = xk+1−xk = αdk. Subsequently, (dLPRPk+1)satisfies the sufficient descent condition ∀k.
Proof.
Now, multiplying (34) by gTk+1 from the left, we get
By the Cauchy-Schwarz inequality, we have
From (20), ‖-yk‖-≤L‖-sk‖-, so
By using (33), we get
Using (32), it further becomes
However, because assumption ‖sk‖-→0, the second summand in (42) tends to zero, so there exists a number 0<γ<1, such that
Now, (42) becomes
where c4 = 1−γ>0 (47)
Therefore, a sufficient descent condition holds for all k.
Case 5: If ξk=0, ωk=0 and 0<ζk<1, then the relation (27) becomes
The sufficient descent condition is satisfied and discussed in [36], such that
Case 6: If ζk=0, 0<ξk<1 and ωk=0, then the relation (27) becomes
Djordjević established in [18] that the sufficient descent condition is satisfied for dFRPRPk+1
Case 7: If ζk=0, ξk=0 and 0<ωk<1, then the relation (27) becomes
In Babando et al. [3], proved that the sufficient descent condition holds for dDYPRPk+1
Case 8: If ζk+ξk=1, 0<ζk<1, 0<ξk<1 and ωk=0, then the relation (27) becomes
In Djordjević [19], Djordjević shown that the sufficient descent condition is satisfied for dLSFRk+1.
Case 9: If 0<ζk<1, 0<ξk<1, ωk=0 and 0<ζk+ξk<1, then the relation (27) becomes
Hanachi et al. proved in [22] that dLSFRPRPk+1 satisfies the sufficient descent condition ∀k, i.e.,
Case 10: If 0<ζk<1, 0<ωk<1, ξk=0 and ζk+ωk=1, then the relation (27) becomes
Liu and Li in [20] shown that dLSDYk+1 satisfies the sufficient descent condition ∀k, i.e.,
Case 11: If 0<ζk<1, 0<ωk<1, ξk=0 and 0<ζk+ωk<1, then there exist constants φ1,φ2,φ3,φ4 such that 0<φ1≤ζk≤φ3<1 and 0<φ2≤ωk≤φ4<1. Now, the relation (27) becomes
Hence
Thus, it has been shown that dLSDYPRPk+1 fulfills the sufficient descent condition.
Case 12: If 0<ξk<1, 0<ωk<1, ζk=0 and ζk+ωk=1, then the relation (27) becomes
Abubakar et al. in [37] shown that dFRDYk+1 satisfies the sufficient descent condition∀k, i.e.,
Case 13: If 0<ξk<1, 0<ωk<1, ζk=0 and 0<ξk+ωk<1, then the relation (27) becomes
Hanachi et al. proved in [21] dFRDYPRPk+1 that satisfies the sufficient descent condition ∀k, i.e.,
Case 14: If 0<ζk<1, 0<ωk<1 and 0<ξk<1 with ζk+ξk+ωk=1, then there exist constants γ1,γ2,γ3 such that 0<γ1≤ζk≤1, 0<γ2≤ξk≤1, and 0<γ3≤ωk≤1. Now, the relation (27) becomes
Hence
Thus, it has been shown that dLSFRDYk+1 fulfills the sufficient descent condition.
Case 15: If 0<ζk<1, 0<ωk<1 and 0<ξk<1 with 0<ζk+ξk+ωk<1, then there exist constants ϖ1,ϖ2,ϖ3,ϖ4,ϖ5,ϖ6 such that 0<ϖ1≤ζk≤ϖ4<1, 0<ϖ2≤ξk≤ϖ5<1, and 0<ϖ3≤ωk≤ϖ6<1. Now, the relation (27) becomes
Thus, it has been shown that dLSFRDYPRPk+1 fulfills the sufficient descent condition.
In this section, we study the global convergence properties of the proposed CG method. For further consideration, we need Assumptions (1) and (2), and the following important result, established by Zoutendijk [38] and Wolfe [4].
Theorem 4.1. Suppose that Assumptions (1) and (2) hold. Consider the iterative method (2), in which, ∀k≥0, the search direction dk is the descent direction, and the step size αk satisfies the strong Wolfe conditions (4) and (5). Then
Where
Inequality (48) is called the Zoutendijk condition.
Lemma 1. Assume that Assumptions (1) and (2) hold. Let dk be a descent direction (23) and suppose the step size αk satisfies (5). Then, the step size αk satisfies the following lower bound,
Proof. From the (26), we have
By applying the Cauchy-Schwarz inequality and Lipschitz condition (20), we obtain
Combining the above inequalities yields
Rearranging gives the desired result (49).
According to Lemma (1) and Assumption (1), (2), the strong Wolfe condition (4), (5), and decent direction (23), we conclude that αk, obtained from our innovative method, is non-zero, i.e., ∃ a constant η > 0 such that
αk≥η,∀k≥0inline(50)
The global convergence of the hRH method is demonstrated by the following theorem:
Theorem 4.2. Assume that Assumptions (1) and (2) holds. The sequence {xk} and {dk} generated by the hRH algorithm, and the search direction dk is a descent direction, and the stepsize αk satisfies the strong Wolfe conditions (4) and (5), then
Proof. A contradiction is used to prove this theorem. Assume that formula (51) does not hold. Therefore, ∃ a constant r > 0 such that
‖-gk‖-≥r, ∀k≥1 (52)
According to Theorem (3.2), it can be deduced that
We can write it as
Then, by (26) and (53), we get
From (20), we obtain
‖-yk‖- = ‖-gk+1−gk‖-≤L‖-sk‖-≤LD(55)
where D=max{‖-x−y‖-,x,y∈N} is the diameter of N and sk=xk+1−xk.
We have
From (6), we obtain
By using (52) and (54), we get
The initial inequality is derived from the conditions 0<ζk,ξk,ωk<1 and 0<1−ζk−ξk−ωk<1. The second inequality uses the Cauchy-Schwarz inequality with (23). The final inequality utilizes (21), (52), and (55). Therefore, (3) and (50) imply that
imply that
However, from (48), (52), and (23), we obtain
This contradicts Theorem (4.1). Therefore, equation (52) is invalid, and the claim in (51) is proven.
This section presents a numerical evaluation of the proposed hRH method compared to the established methods: DHF [23], HLSDY [20], HLSFR [19], and HDYCDHS [24]. The evaluations used 170 benchmark test problems obtained from [39-41] with different dimensions (n) ranging from 2 to 200,000. The measures adopted included the time required to solve the problem, the number of iterations, and the number of function and gradient evaluations. The implementation was performed using MATLAB (R2024b) on a personal computer with an Intel(R) Core (TM) i5-8250U CPU @ 1.60 GHz, 8.00 GB RAM, and a 64-bit Windows 11 Education operating system. The convergence criterion was set to ‖gk‖-≤0−6 or when the maximum number of iterations reaches 2000. A comprehensive list of these test problems is provided in Table 2. All of these algorithms implement the strong Wolfe parameters and are fixed at δ = 0.0001 and σ = 0.1. The numerical comparisons were based on the performance profile framework proposed by Dolan and Moré [42]. In the analysis of the performance profiles, the upper curve indicates that the method is better. The effectiveness of the method is determined according to the performance metric Ps(τ), which measures the performance of the solver on a set of benchmark problems. Here, Ps(τ) denotes the proportion of problems solved by the solver with a performance ratio less than or equal to τ. Using this metric, we obtained the ratio of the solver’s capacity to properly assess how this optimization problems can be solved. Thus, efficient procedures must be considered and simplified. Taking Ps(τ) = P(τ), it is also taken that S = τ, The numerical results were compared graphically. Figs. 1-3 show the comparison results.
Fig. 1. Time performance for DHF, HLSDY, HLSFR, HDYCDHS, and hRH methods.
Fig. 2. Iteration count performance for DHF, HLSDY, HLSFR, HDYCDHS, and hRH methods.
Fig. 3. Objective function/gradient calculation performance for DHF, HLSDY, HLSFR, HDYCDHS, and hRH methods.
TABLE 1: List of test problems, dimensions, and initial points
In Fig. 1, we present the performance curves of the computational time. The vertical axis represents the fraction solved within a given ratio τ, while the horizontal axis represents the time ratio τ. When 0<τ≤0.5, hRH shows better performance and solves 77% of the problems in less time than the other methods, while HLSDY and HLSFR 70% and HDYCDHS and DHF 67% and 51%, respectively. With τ rising to τ≥0.5, hRH remains the most efficient at 95%,
While HLSDY takes the second place producing 89%, third place is taken by DHF and HLSDY 88%, and HLSFR in last place 87%. As expected, hRH is faster in terms of computational time than the other algorithms. Concurrently, if τ is between 0.5<τ≤1, our method solved 90% problems, and that of the second-best method, HDYCDHS solved 82% problems.
Fig. 2 shows the relative measures, which are the iteration counts of each algorithm for convergence. In 0<τ≤0.5, the hRH method demonstrated the highest resolution rate compared to the other methods, solving 92% of the problems with a smaller number of iterations than all the other methods, approximately 82%. Again, as τ≥0.5, the efficiency of the hRH method is 95% while for HDYCDHS, it is 89%, and for the other methods, it is 88%. Meanwhile, if τ is between 0.5<τ≤1, the proposed method solved 95%, the HDYCDHS method 87% and the other methods 84%. These results demonstrate that the proposed hRH algorithm achieves a better solution with fewer iterations, particularly in high-dimensional or challenging problem domains.
The performance profile of the number of function/gradient evaluations is depicted in Fig. 3. The relative performance of hRH within the range 0<τ≤0.5 is as follows: hRH solves 88% of the problems with fewer evaluations compared to HLSDY, HLSFR, and HDYCDHS solve 81% and DHF solves 76%. Up to τ≥1, the efficiency of hRH is 95%, that of the second-best method HDYCDHS is 89%, and that of the others is 87%. The results show that the proposed hRH algorithm requires fewer function/gradient evaluations than other algorithms, while significantly reducing the overall computational cost.
For the EXROSEN function, the proposed (hRH) method uniquely succeeded in solving the problem across the seven tested dimensions, whereas the comparative methods (DHF, HLSDY, HLSFR, and HDYCDHS) failed in all dimensions because of jamming, that is, generating many short steps without making significant progress toward the solution. The numerical simulation also confirmed the effectiveness of the hRH method for the investigated performance parameters. It can be observed that it always performs better than the other methods in terms of computational time, number of iterations, and number of function/gradient evaluations. These results demonstrate that the proposed hRH algorithm is stable and fast for optimizing unconstrained optimization problems.
In this section, we evaluate the performance of the proposed hRH algorithm in restoring color images corrupted by impulse noise.
Following the spirit of [43], we treat high-density noise in digital images using a dual-phase framework. In the first phase, we used an adaptive median-based method to identify the corrupted pixels. Let us assume that Z is a color image of dimensions M×N and Ω = {1,…M}×{1,…N} is the pixel index domain. Therefore, we assume that the subset of noisy pixels, denoted by D, is a subset of and choose |D| to be the total number of noisy pixels. Every noisy pixel at location (i, j) has four immediate neighbors in the neighborhood Wi,j around it: {(i−1,j),(i+1,j),(i,j-1),(i,j+1)}.
We observe the noisy pixel value, denoted as zij, and the recovered value, denoted as vi,j. The denoising process is formulated as a non-smooth optimization.
where the regularization terms are given by
The function acts as a smooth penalty term that preserves the edges and is characterized by a positive parameter δ. The vector v = [vi,j](i,j)∈D is a lexicographically ordered list of unknowns. Directly minimizing (56) can be computationally expensive, particularly for high-resolution images with dense noise.
To resolve this issue, a smooth surrogate model was presented in [44], eliminating the non-smooth term and resulting in the following equivalent formulation:
With an increasing noise ratio, the number of corrupted pixels also increases, and the size and complexity of the problem (57) increase drastically. Nevertheless, as demonstrated in [44], CGMs remain efficient and scalable for this problem, even under severe corruption.
Nevertheless, color images require more attention than grayscale images because they are multichannel. Each of the three-color images contains three channels: Red, Green, and Blue (RGB), and there may be noise present in each of these channels differently. Therefore, the restoration process is much more complex. Therefore, we adopt a channel-wise restoration strategy where each RGB component is selectively restored independently using the same optimization procedure. To preserve reality in the final output, care is taken to ensure that the overall structure consists of what is present across channels. We show that this strategy can be used effectively to handle independently corrupted color channels without introducing artifacts in the reconstructed image.
We then applied this model to tackle salt-and-pepper noise, which is the most prevalent form of impulse corruption in color images. It was first operated using an adaptive median filter [45] to detect the noisy pixels across all color channels. The solution of (57) using our proposed hRH algorithm is compared in the reconstruction step to several classical CGMs: PRP, LS, and FR methods. We evaluated the performance on several widely used test images of size 768×512: Girl, Lighthouse, Hats, and Macaws. The termination condition is defined as:
The experimental platform matches the one described in section 5. Restoration quality is quantified using the Peak Signal-to-Noise Ratio (PSNR), computed as
where xri,j and x0i,j denote the restored and original pixel values, respectively, and
The key metrics, including the iteration count, calculation time, and PSNR for all tested methods, are summarized in Table 3. We only provide visual results for noise levels of 70% and 90% in Figs. 4 and 5 to illustrate the original, noisy, and restored images, to conserve space.
Fig. 4. (a-x) Original images (first row), noisy images with 70% salt-and-pepper noise (second row), and restored results with hRH (row 3), PRP, (row 4), LS (row 5) and FR (row 6).
Fig. 5. (a-x) Original images (first row), noisy images with 90% salt-and-pepper noise (second row), and restored results with hRH (row 3), PRP, (row 4), LS (row 5) and FR (row 6).
TABLE 2: Image denoising results for different methods (hRH, PRP, LS, and FR)
The proposed hRH method demonstrates a marked advantage over the classical CG methods (PRP, LS, and FR) for image denoising. While the LS algorithm provides competitive PSNR values, our hRH method excels in high-noise scenarios, delivering superior or comparable image quality where it matters most. The most significant strength of hRH lies in its convergence efficiency, as it consistently requires a fraction of the iterations needed by the other methods. Although the FR method is frequently faster in terms of computation time, it suffers from poor accuracy. Conversely, PRP and LS are generally slower without consistently outperforming hRH in quality. Overall, the hRH method provides the best trade-off between accuracy and efficiency, proving to be the most robust and reliable algorithm tested, particularly in challenging denoising tasks.
In this study, a novel hybrid CG algorithm for unconstrained optimization is proposed and called βhRHk. The update parameter is a convex combination of the established CG formulas:
where the coefficients are selected such that they satisfy the Dai-Liao conjugacy condition. The proposed method guarantees descent directions by an inexact line search and demonstrates global convergence under certain general conditions. Computational testing of the hRH method on a range of benchmark problems showed good performance, which was superior to that of conventional CG methods. In addition, the practical feasibility of the proposed method was demonstrated by its successful application to image impulse denoising, where the images were restored clearly despite the preservation of the intrinsic structural information. The above results thus show the versatility and applicability of the proposed method to standard optimization problems and real-world image processing problems.
We sincerely thank the anonymous reviewers for their thoughtful comments and insightful suggestions, which helped us improve the clarity and quality of this study. We are also grateful to the editor for their kind support and careful handling of the review process.
[1] W. W. Hager and H. Zhang. “A survey of nonlinear conjugate gradient methods“. Pacific Journal of Optimization, . 2, . 1, pp. 35-58, 2006.
[2] M. R. Hestenes and E. Stiefel. “Methods of Conjugate Gradients for Solving Linear Systems“. vol. 49. NBS, Washington, DC, 1952.
[3] H. A. Babando, M. Barma, S. Nasiru and S. Suraj. “A dai-liao hybrid prp and dy schemes for unconstrained optimization“. International Journal of Development Mathematics, . 1, . 1, pp. 41-53, 2024.
[4] P. Wolfe. “Convergence conditions for ascent methods“. SIAM Review, . 11, . 2, pp. 226-235, 1969.
[5] P. Wolfe. “Convergence conditions for ascent methods. II:Some corrections“. SIAM Review, . 13, . 2, pp. 185-188, 1971.
[6] R. Fletcher and C. M. Reeves. “Function minimization by conjugate gradients“. The Computer Journal, . 7, . 2, pp. 149-154, 1964.
[7] E. Polak and G. Ribiere. “Note sur la convergence de méthodes de directions conjuguées. Revue française d'informatique et de recherche opérationnelle“. Série Rouge, . 3, . 16, pp. 35-43, 1969.
[8] B. T. Polyak. “The conjugate gradient method in extremal problems“. USSR Computational Mathematics and Mathematical Physics, . 9, . 4, pp. 94-112, 1969.
[9] R. Fletcher. “Practical Methods of Optimization“. John Wiley and Sons, Hoboken, 2000.
[10] Y. Liu and C. Storey. “Efficient Generalized conjugate gradient algorithms, part 1:Theory“. Journal of Optimization Theory and Applications, . 69, pp. 129-137, 1991.
[11] Y. H. Dai and Y. Yuan. “A nonlinear conjugate gradient method with a strong global convergence property“. SIAM Journal on Optimization, . 10, . 1, pp. 177-182, 1999.
[12] J. Nocedal and S. J. Wright. “Numerical Optimization“. Springer, Berlin, 2006.
[13] N. Andrei. “Numerical comparison of conjugate gradient algorithms for unconstrained optimization“. Studies in Informatics and Control, . 16, . 4, pp. 333-352, 2007.
[14] M. J. D. Powell. Nonconvex Minimization Calculations and the Conjugate Gradient Method. In:“Numerical Analysis:Proceedings of the 10th Biennial Conference Held at Dundee, Scotland, June 28-July 1, 1983“, Springer, pp. 122-141, 1984.
[15] Y. Dai, J. Han, G. Liu, D. Sun, H. Yin and Y. X. Yuan. “Convergence properties of nonlinear conjugate gradient methods“. SIAM Journal on Optimization, . 10, . 2, pp. 345-358, 2000.
[16] S. Babaie-Kafaki. “A hybrid conjugate gradient method based on a quadratic relaxation of the dai-yuan hybrid conjugate gradient parameter“. Optimization, . 62, . 7, pp. 929-941, 2013.
[17] N. Andrei. “Another nonlinear conjugate gradient algorithm for unconstrained optimization“. Optimization Methods and Software, . 24, . 1, pp. 89-104, 2009.
[18] S. S. Djordjević. “New hybrid conjugate gradient method as a convex combination of FR and PRP methods“. Filomat, . 30, . 11, pp. 3083-3100, 2016.
[19] S. S. Djordjević. “New hybrid conjugate gradient method as a convex combination of LS and FR methods“. Acta Mathematica Scientia, vol. 39. . 214-228, 2019.
[20] J. K. Liu and S. J. Li. “New hybrid conjugate gradient method for unconstrained optimization“. Applied Mathematics and Computation, . 245, pp. 36-43, 2014.
[21] S. B. Hanachi, B. Sellami and M. Belloufi. “New iterative conjugate gradient method for nonlinear unconstrained optimization“. RAIRO-Operations Research, . 56, . 4, pp. 2315-2327, 2022.
[22] S. B. Hanachi, B. Sellami and M. Belloufi. “A new family of hybrid conjugate gradient method for unconstrained optimization and its application to regression analysis“. RAIRO-Operations Research, . 58, no. 1. . 613-627, 2024.
[23] N. Salihu, M. R. Odekunle, A. M. Saleh, S. Salihu. “A dai-liao hybrid hestenes-stiefel and fletcher-revees methods for unconstrained optimization“. International Journal of Industrial Optimization, . 2, . 1, pp. 33-50, 2021.
[24] A. Hallal, M. Belloufi and B. Sellami. “An efficient new hybrid cg-method as convex combination of DY and CD and HS algorithms“. RAIRO-Operations Research, . 56, . 6, pp. 4047-4056, 2022.
[25] A. Perry. “A modified conjugate gradient algorithm“. Operations Research, . 26, . 6, pp. 1073-1078, 1978.
[26] Y. H. Dai and L. Z. Liao. “New conjugacy conditions and related nonlinear conjugate gradient methods“. Applied Mathematics and Optimization, . 43, pp. 87-101, 2001.
[27] A. B. Abubakar and P. Kumam. “A descent dai-liao conjugate gradient method for nonlinear equations“. Numerical Algorithms, . 81, pp. 197-210, 2019.
[28] N. Andrei. “A dai-liao conjugate gradient algorithm with clustering of eigenvalues“. Numerical Algorithms, . 77, . 4, pp. 1273-1282, 2018.
[29] S. Babaie-Kafaki. “On optimality of two adaptive choices for the parameter of dai-liao method“. Optimization Letters, . 10, pp. 1789-1797, 2016.
[30] S. Babaie-Kafaki and R. Ghanbari. “The dai-liao nonlinear conjugate gradient method with optimal parameter choices“. European Journal of Operational Research, . 234, . 3, pp. 625-630, 2014.
[31] S. Babaie-Kafaki and R. Ghanbari. “A descent family of dai-liao conjugate gradient methods“. Optimization Methods and Software, . 29, . 3, pp. 583-591, 2014.
[32] S. Babaie-Kafaki and R. Ghanbari. “Two optimal dai-liao conjugate gradient methods“. Optimization, . 64, . 11, pp. 2277-2287, 2015.
[33] S. Babaie-Kafaki and R. Ghanbari. “Two adaptive dai-liao nonlinear conjugate gradient methods“. Iranian Journal of Science and Technology, Transactions A:Science, 42, pp. 1505-1509, 2018.
[34] B. Ivanov, G. V. Milovanovićand P. S. Stanimirović. “Accelerated dai-liao projection method for solving systems of monotone nonlinear equations with application to image deblurring“. Journal of Global Optimization, . 85, . 2, pp. 377-420, 2023.
[35] M. Y. Waziri, K. Ahmed and J. Sabi'u. “A dai-liao conjugate gradient method via modified secant equation for system of nonlinear equations“. Arabian Journal of Mathematics, . 9, pp. 443-457, 2020.
[36] A. B. Abubakar, P. Kumam, M. Malik and A. H. Ibrahim. “A hybrid conjugate gradient based approach for solving unconstrained optimization and motion control problems“. Mathematics and Computers in Simulation, . 201, pp. 640-657, 2022.
[37] A. B. Abubakar, P. Kumam, M. Malik, P. Chaipunya and A. H. Ibrahim. “A hybrid FR-DY conjugate gradient algorithm for unconstrained optimization with application in portfolio selection“. AIMS Mathematics, . 6, . 6, pp. 6506-6527, 2021.
[38] G. Zoutendijk. “Nonlinear programming, computational methods“. Integer and Nonlinear Programming“, pp. 37-86, 1970.
[39] N. Andrei. “An unconstrained optimization test functions collection“. Advanced Modeling and Optimization, . 10, . 1, pp. 147-161, 2008.
[40] N. I. M, Gould, D. Orban and P. L. Toint. “Cuter and sifdec:A constrained and unconstrained testing environment, revisited“. ACM Transactions on Mathematical Software, . 29, . 4, pp. 373-394, 2003.
[41] J. J. Moré, B. S. Garbow and K. E. Hillstrom. “Testing unconstrained optimization software“. ACM Transactions on Mathematical Software, . 7, . 1, pp. 17-41, 1981.
[42] E. D. Dolan and J. J. Moré. “Benchmarking optimization software with performance profiles“. Mathematical Programming, . 91, pp. 201-213, 2002.
[43] R. H. Chan, C. W. Ho and M. Nikolova. “Salt-and-pepper noise removal by median-type noise detectors and detail-preserving regularization“. IEEE Transactions on Image Processing, . 14, . 10, pp. 1479-1485, 2005.
[44] J. F. Cai, R. Chan and B. Morini. Minimization of an Edge-Preserving Regularization Functional by Conjugate Gradient Type Methods. In:“Image Processing Based on Partial Differential Equations:Proceedings of the International Conference on PDEBased Image Processing and Related Inverse Problems“, CMA, Oslo, Springer, pp. 109-122, 2007.
[45] H. Hwang and R. A. Haddad. “Adaptive median filters:New algorithms and results“. IEEE Transactions on Image Processing, . 4, . 4, pp. 499-502, 1995.