A Novel Class of Hybrid Conjugate Gradient Methods for Unconstrained Optimization with Applications to Image Denoising

Rega Jalal¹, Hawraz N. Jabbar²

¹College of Science, Department of Mathematics, University of Sulaimani, Sulaymaniyah, 46001, Kurdistan Region of Iraq, ² College of Science, Department of Mathematics, University of Kirkuk, Kirkuk, 36013, Iraq.

Corresponding author’s e-mail: Rega Jalal, College of Science, Department of Mathematics, University of Sulaimani, Sulaymaniyah, 46001, Kurdistan Region of Iraq. E-mail: rega.awrahman@univisul.edu.iq

Received: 03-07-2025 Accepted: 16-08-2025 Published: 27-09-2025

DOI: 10.21928/uhdjst.v9n2y2025.pp166-183

ABSTRACT

This paper presents a new hybrid conjugate gradient (CG) method for solving large-scale unconstrained optimization problems where classical CG algorithms may not perform well. This new method integrates four classical algorithms, namely, Liu and Storey, Fletcher and Reeves, Dai and Yuan, and Polak, Ribiere, and Polyak, using a convex combination of CGs and an inexact line search based on strong Wolfe conditions, and adheres to the Dai-Liao conjugate condition to enhance convergence properties. The theoretical study established the conditions for sufficient descent and global convergence of the hybrid CG method. Numerical studies demonstrated that the proposed method significantly reduced the number of iterations and computational time, achieving superior evaluation efficiency compared with other CG methods. In particular, the effectiveness of the proposed algorithm is validated for image impulse denoising, where it can recover high-quality images while preserving significant features, implying its practical applicability to real-world signal and image processing problems.

Index Terms: Conjugate Gradient Method, Strong Wolfe Condition, Global Convergence, Unconstrained Optimization, Convex Combination, Hybrid Method, Image Processing

1. INTRODUCTION

Conjugate gradient (CG) methods are a category of unconstrained optimization algorithms that are distinguished by their low memory requirements and robust local and global convergence characteristics [1]. The main objective of the CG method was to solve a linear system, Hestenes and Stiefel [2]. Later, the method was modified to address an unconstrained optimization problem because solving a linear system is equivalent to minimizing a positive definite quadratic function [3]. CG methods have significantly contributed to solving large-scale non-linear optimization problems. The CG method is often used to solve the following unconstrained optimization problem:

where f :ℝⁿ → ℝ is a smooth non-linear function with an available gradient.

The iterative formula of the CG method is expressed as follows,

and

In this context, x₀ is the initial point, and x_k represents the present iteration. The gradient of f at point x_k is given by g_k = ∇f(x_k). The search direction is denoted by d_k., and the conjugate (update) parameter. β_k∈ℝ determines various iterations of the CG method. In addition, the step size, which can be achieved using numerous line search methods, is represented by a positive value α_k>0.

In this study, we use an inexact line search called the strong Wolfe line search, which has garnered significant focus in convergence analysis and the execution of CG methods and is characterized by the following requirements [4,5].

where δ, σ are scalars and satisfy 0<δ≤σ<0.5.

Assume that ‖-.‖- is Euclideannorm and y_k = g_k+1−g_k. If f is a strongly convex quadratic, then theoretically, all six options for the update parameter in Table 1 are equal to an exact line search [12]. However, they exhibit distinct behavior for general non-quadratic functions with inexact line search. Various CG methods correspond to different values based on the scalar parameter β_k. The methods can be classified into two groups.

TABLE 1: Choices of β_k in standard conjugate gradient methods.

In the first group, all CG parameters have the same numerator g^T_k+1y_k. The methods have been proposed by Hestenes and Stiefel (HS), Polak, Ribiere, and Polyak (PRP), and Liu and Storey (LS) using the following CG parameters [1].

In the second group, ‖g_k+1‖² is the common numerator of all the CG parameters. The following CG parameters were proposed by Fletcher and Reeves (FR), Fletcher (conjugate descent or CD), and Dai and Yuan (DY), and include the following CG parameters [1].

Each CG method has distinct advantages and disadvantages. The second group of FR, CD, and DY methods with ‖g_k+1‖² in the numerator has significantly worse numerical performance than the first group of HS, PRP, and LS algorithms [13]. The poor numerical performance of the second group can be attributed to the jamming phenomenon, whereby the algorithm may execute several brief iterations without achieving much advancement toward the minimum [13]. Nonetheless, the first group HS, PRP, and LS with g_k^T+1y_kin the numerator inherently has an automated approximation restart mechanism that prevents jamming. Specifically, when the step α_k is small, the component y_k in the numerator approaches zero. Consequently, β_k decreases, and the new search direction d_k+1 approximates the steepest descent direction. However, Powell [14] developed a three-dimensional counterexample demonstrating that the PRP and HS methods may cycle indefinitely without a solution, indicating their lack of global convergence under specific conditions.

The FR, CD, and DY methods generally exhibit strong global convergence features [15], although their computational performance may be limited [13]. Simultaneously, the HS, PRP, and LS methods can fail to consistently converge, even though they often exhibit superior computing efficiency [13]. In Hager and Zhang [1] asserts that a comprehensive review of several CG approaches was presented.

The most important category of CG algorithms is hybrid CG algorithms. The goal of this study was to improve the computing speed and maintain robust global convergence by combining four types of CG methods. Hybrid CG methods include switching a parameter in the second group to the first group when iterations are stuck [16]. Recently, some hybrid CG methods have been proposed. Andrei proposed the following hybrid CG method: β^hyb_k=(1-θ_k)β^HS_k+θ_kβ^DY_k [17]. Djordjević proposed following hybrid CG methods: β^hyb_k=(1-θ_k)β^PRP_k+(θ_kβ^FR_k+β^hyb_k=(1-θ_k)β^LS_k+θ_kβ^FR_k [18,19]. Babando et al. proposed this hybrid method: β^hyb_k=(1-θ_k)β^PRP_k+θ_kβ^DY_k[3]. Liu and Li introduced a new hybrid approach that combines the two methods: β^hyb_k=(1-θ_k)β^LS_k+θ_kβ^DY [20]. Hanachi et al. presented a hybrid approach that is a combination of three CG methods, as described in [21] and [22].

Inspired by the research of Hanachi et al. [21] and [22], we propose a new hybrid CG method that combines the LS, FR, DY, and PRP methods to solve problem (1). This study represents the first research effort to propose and analyze a hybrid CG method that utilizes a convex combination of four classical CG methods. To the best of our knowledge, no previous study has explored such a combination for developing of CG algorithms. The remainder of this paper is organized as follows: In section 2, we introduce the hybrid CG approach and demonstrate how we obtained the parameters ζ_k,ξ_k and ω_k using various methods. Under moderate circumstances, we show that the chosen method, when used with a strong Wolfe line search, provides directions that satisfy the sufficient descent condition. The algorithm is described in section 3. In section 4, we examine the convergence properties of the proposed method. Through comprehensive numerical comparisons with several existing hybrid methods in the literature using 170 distinct test problems, demonstrating the improved efficiency of the proposed method in section 5, was also validated for an application to image impulse denoising in section 6. Section 7 concludes with a brief summary.

2. HYBRID CG ALGORITHM

In this section, to enhance the performance of the CG updating parameter proposed by Hestenes and Stiefel [2], Fletcher and Reeves [6], Dai and Yuan [11], Polak and Ribiere [7] and Polyak [8], CG methods, we employed their convex combination, in which the parameter β_k in the proposed method, referred to as β^hRH_k, i.e.,

The parameters ζ_k,ξ_k,ω_k in equation (6) that meet the condition 0≤ζ_k, ξ_k, ω_k≤1 are referred to as hybridization parameters. These parameters are obtained in a specific manner, which will be discussed in detail later and the search direction d_k is computed from

To determine the step length α_k, we utilize the strong Wolfe line search (4), (5).

There are 15 separate cases depending on the parameter values where 0≤ζ_k, ξ_k, ω_k≤1 and ζ_k+ξ_k+ω_k≤1.

It is clear from equations (6) and (7) that

For both numerical computation and convergence analysis, the traditional conjugacy condition is crucial in the CG method. This condition is based on an exact line search to obtain the step size α_k; however, in practical computations, where an inexact line search is commonly utilized, the condition may not be strictly satisfied.

In 1978, Perry [25] generalized the conjugacy condition and incorporated ideas from quasi-Newton methods. In particular, Perry supplemented the conjugacy condition by adding modifications to the standard conjugacy condition. The secant equation was incorporated into this formulation. Perry’s extension enables the CG method to be brought in line with the quasi-Newton direction, is flexible, and is more robust under an exact line search.

However, it is still contingent on the exact line search and is therefore not particularly effective in certain numerical optimization procedures. In addition, inexact line searches rather than exact line searches are used to determine the step size α_k.

Therefore, in 2001, Dai and Liao [26] suggested a less stringent conjugacy condition called the D-L conjugacy condition:

Where s_k = x_k+1−x_k, and t is a non-negative modulating parameter. The selection of an optimal value for t remains an active area of research, with several potential choices proposed in [27-35]. Notably, for t=0, equation (11) simplifies to equation (9), while for t=1, it corresponds to equation (10). To select parameter t, this study employs as specified in [30]. The parameters ζ_k, ξ_k, and ω_k are chosen by applying the D-L conjugacy condition, as given in equation (11). Taking the inner product of equation (3) with the vector y^k_T and calculating β_k using equation (6) yields

By enforcing the D-L condition d^T_k+1y_k=-tg^T_{k+1s_k , we obtain}

To simplify the presentation, we define the following terms:

Rearranging the equation to solve for the hybridization parameters yields

This is an underdetermined linear equation in ζ_k,ξ_k,ω_k. For our algorithm, we propose a strategy in which we compute one parameter based on the values of the other two (which can be set to defaults or previous values). This yields the following update rule:

Note that while these formulas appear simpler, substituting the full expressions for A_LS, A_FR, A_DY, and A_PRP would recover more detailed forms of the equations. However, this formulation makes the underlying structure of the linear system explicit. In our algorithm, care is taken to handle cases in which any denominator is close to zero.

The parameters ζ_k,ξ_k, and ω_k. may lie beyond the interval [0, 1]. The following rule must be applied to obtain a valid convex combination in equation (6): If ζ_k,ξ_k,ω_k≤0, then set ζ_k,ξ_k,ω_k=0 in (6). If ζ_k,ξ_k,ω_k≥1, then we assign ζ_k,ξ_k,ω_k=1 in (6). If ζ_k+ξ_k+ω_k≥1, then we change ζ_k,ξ_k,ω_k=1 in (6).

Algorithm 1: hybrid Rega and Hawraz Algorithm

Require: Initial point x₀∈Rⁿ, parameters δ,σ with 0 <δ≤σ<0.5, tolerance: ϵ = 10⁻⁶

Initialization: Set k = 0, compute f(x₀), g₀ =∇f(x₀). If ||g₀ || < ϵ, stop.

Set: d₀ = −g₀.

while ||g_k||≥ϵ do

Line Search: Find step size α_k > 0 satisfying strong Wolfe conditions (4) and (5).

Update variables:

x_k+1 = x_k+α_kd_k; g_{k+1} = ∇f(x_k+1)

s_k = x_k+1−x_k; y_k = g_{k+1}−g_k.

Compute hybridization parameters:

Compute A_LS, A_FR, A_DY, A_PRP using equations (12)-(15).

Compute initial values for ζ_k,ξ_k,ω_k using equations (16)-(18).

If any denominator is near zero, the corresponding parameter is set to zero.

Enforce convex combination constraints: Clamp each parameter to the interval [0, 1]:

ζ_k← max(0, min(1, ζ_k))

ξ_k← max(0, min(1, ξ_k))

ω_k← max(0, min(1, ω_k))

if ζ_k+ξ_k+ω_k > 1 then

Normalize the sum to 1:

end if

Compute conjugate parameter β^hRH_k

Calculate

Update search direction: Set d_k+1=-g_k+1+β^hRH_kd_k

Restart condition (Powell):

if then

end if

Iteration Update: Set k← k+1

end while

3. SUFFICIENT DESCENT CONDITION

In this section, we demonstrate the sufficient descent of the proposed method, which is crucial for examining global convergence. These fundamental assumptions regarding the objective function are required to demonstrate the global convergence of the hRH method.

Assumption 1. The level set is bounded, i.e., ∃ a constant B>0 such that

‖-x‖-≤B,∀x∈L(19)

Assumption 2. In a neighborhood N of L, the function f is continuously differentiable, and its gradient ∇f(x) is Lipschitz continuous, that is, ∃ a constant 0<L<∞ such that:

‖-∇f(x)-∇f(y)‖-≤L‖-x−y‖-,∀x,y∈N(20)

Under Assumptions (1) and (2) on f, ∃ a constant c such that

Theorem 3.1. If the relations (6) and (7) are valid, then

Proof. Now, from (7), we have

Given in the equation (6), the final form is given by

can write

It follows that

Here, we obtain

Thus, the theorem has been proven.

Theorem 3.2. Let the sequences {g_k} and {d_k} be generated by the hRH method. The search direction d_k satisfies the sufficient descent condition.

Proof. According to the hRH method, when the Powell restart criterion is satisfied, that is, ∣g^{T_k+1gk∣≥0.2‖g_k+1‖² it follows that d_k+1=−g_k+1 and (23) is also valid. We proceeded with the assumption that the Powell restart criterion was invalid. Next, we obtain}

We demonstrate the proof by mathematical induction for k=0,d_{0_{=-g₀ so g^T₀d₀=-‖g₀‖² We found that the sufficient descent condition is satisfied for k=0. We now assume that (23) is hold for k, thus}}

We now show that this is true for k+1. From the strong Wolfe condition (5),

Multiplying (22) by g^T_k+1 from the left yields

It is necessary to prove 15 cases depending on the parameter values:

Case 1: If ζ_k=1, ξ_k=0 and ω_k=0, then the relation (27) becomes

The search direction for the LS method is given by

Taking the inner product of (28) with g^T_k+1

Using the definition y_k = g_k+1−g_k, we can expand the term g^T_k+1y_k:

By taking the absolute value and applying the triangle inequality, we obtain

From the second strong Wolfe condition (5) and d_k is a descent direction (25), which gives

Substituting (29) into our main expression, we obtain

Since we are assumed that the Powell restart condition is not met, we have from (24) that ∣g^T_k+1g_k∣⩽0.2‖g_{k+1‖². Substituting into (30):}

Let c₁ = (1−1.2σ). Since σ<0.5, we have c₁>0. Therefore:

Therefore, the search direction d^{LS_k+1 fulfills the sufficient descent condition.}

Case 2: If ζ_k=0, ξ_k=1 and ω_k=0, then the relation (27) becomes

A sufficient descent condition for the FR method under strong Wolfe conditions was established in [1]. Thus, ∃ a constant c₂ > 0 such that

Case 3: If ζ_k=0, ξ_k=0 and ω_k=1, then the relation (27) becomes

Thus, it has been shown that d^DY_k+1 fulfills the sufficient descent condition.

Case 4: If ζ_k=0, ξ_k=0 and ω_k=0 then the relation (27) becomes

We demonstrate the sufficient descent condition of the PRP method by the following theorem.

Theorem 3.3. [19] Under Assumptions (1) and (2), suppose that the step sizes α_k satisfy conditions (4) and (5) with σ<½. If ‖s_{k_{‖→0 and there exist non-negative constants τ₁ and τ₂ such that}}

where s_k = x_k+1−x_k = αd_k. Subsequently, (d^{LPRP_k+1)satisfies the sufficient descent condition ∀k.}

Proof.

Now, multiplying (34) by g^{T_k+1 from the left, we get}

By the Cauchy-Schwarz inequality, we have

From (20), ‖-y_k‖-≤L‖-s_k‖-, so

By using (33), we get

Using (32), it further becomes

However, because assumption ‖s_k‖-→0, the second summand in (42) tends to zero, so there exists a number 0<γ<1, such that

Now, (42) becomes

where c₄ = 1−γ>0 (47)

Therefore, a sufficient descent condition holds for all k.

Case 5: If ξ_k=0, ω_k=0 and 0<ζ_k<1, then the relation (27) becomes

The sufficient descent condition is satisfied and discussed in [36], such that

Case 6: If ζ_k=0, 0<ξ_k<1 and ω_k=0, then the relation (27) becomes

Djordjević established in [18] that the sufficient descent condition is satisfied for d^FRPRP_k+1

Case 7: If ζ_k=0, ξ_k=0 and 0<ω_k<1, then the relation (27) becomes

In Babando et al. [3], proved that the sufficient descent condition holds for d^DYPRP_k+1

Case 8: If ζ_k+ξ_k=1, 0<ζ_k<1, 0<ξ_k<1 and ω_k=0, then the relation (27) becomes

In Djordjević [19], Djordjević shown that the sufficient descent condition is satisfied for d^LSFR_k+1.

Case 9: If 0<ζ_k<1, 0<ξ_k<1, ω_k=0 and 0<ζ_k+ξ_k<1, then the relation (27) becomes

Hanachi et al. proved in [22] that d^{LSFRPRP_k+1 satisfies the sufficient descent condition ∀k, i.e.,}

Case 10: If 0<ζ_k<1, 0<ω_k<1, ξ_k=0 and ζ_k+ω_k=1, then the relation (27) becomes

Liu and Li in [20] shown that d^{LSDY_k+1 satisfies the sufficient descent condition ∀k, i.e.,}

Case 11: If 0<ζ_k<1, 0<ω_k<1, ξ_k=0 and 0<ζ_k+ω_k<1, then there exist constants φ₁,φ₂,φ₃,φ₄ such that 0<φ₁≤ζ_k≤φ₃<1 and 0<φ₂≤ω_k≤φ₄<1. Now, the relation (27) becomes

Hence

Thus, it has been shown that d^{LSDYPRP_k+1 fulfills the sufficient descent condition.}

Case 12: If 0<ξ_k<1, 0<ω_k<1, ζ_k=0 and ζ_k+ω_k=1, then the relation (27) becomes

Abubakar et al. in [37] shown that d^{FRDY_k+1 satisfies the sufficient descent condition∀k, i.e.,}

Case 13: If 0<ξ_k<1, 0<ω_k<1, ζ_k=0 and 0<ξ_k+ω_k<1, then the relation (27) becomes

Hanachi et al. proved in [21] d^{FRDYPRP_k+1 that satisfies the sufficient descent condition ∀k, i.e.,}

Case 14: If 0<ζ_k<1, 0<ω_k<1 and 0<ξ_k<1 with ζ_k+ξ_k+ω_k=1, then there exist constants γ₁,γ₂,γ₃ such that 0<γ₁≤ζ_k≤1, 0<γ₂≤ξ_k≤1, and 0<γ₃≤ω_k≤1. Now, the relation (27) becomes

Hence

Thus, it has been shown that d^{LSFRDY_k+1 fulfills the sufficient descent condition.}

Case 15: If 0<ζ_k<1, 0<ω_k<1 and 0<ξ_k<1 with 0<ζ_k+ξ_k+ω_k<1, then there exist constants ϖ₁,ϖ₂,ϖ₃,ϖ₄,ϖ₅,ϖ₆ such that 0<ϖ₁≤ζ_k≤ϖ₄<1, 0<ϖ₂≤ξ_k≤ϖ₅<1, and 0<ϖ₃≤ω_k≤ϖ₆<1. Now, the relation (27) becomes

Thus, it has been shown that d^{LSFRDYPRP_k+1 fulfills the sufficient descent condition.}

4. CONVERGENCE ANALYSIS

In this section, we study the global convergence properties of the proposed CG method. For further consideration, we need Assumptions (1) and (2), and the following important result, established by Zoutendijk [38] and Wolfe [4].

Theorem 4.1. Suppose that Assumptions (1) and (2) hold. Consider the iterative method (2), in which, ∀k≥0, the search direction d_k is the descent direction, and the step size α_k satisfies the strong Wolfe conditions (4) and (5). Then

Where

Inequality (48) is called the Zoutendijk condition.

Lemma 1. Assume that Assumptions (1) and (2) hold. Let d_k be a descent direction (23) and suppose the step size α_k satisfies (5). Then, the step size α_k satisfies the following lower bound,

Proof. From the (26), we have

By applying the Cauchy-Schwarz inequality and Lipschitz condition (20), we obtain

Combining the above inequalities yields

Rearranging gives the desired result (49).

According to Lemma (1) and Assumption (1), (2), the strong Wolfe condition (4), (5), and decent direction (23), we conclude that α_k, obtained from our innovative method, is non-zero, i.e., ∃ a constant η > 0 such that

α_k≥η,∀_k≥0inline(50)

The global convergence of the hRH method is demonstrated by the following theorem:

Theorem 4.2. Assume that Assumptions (1) and (2) holds. The sequence {x_k} and {d_k} generated by the hRH algorithm, and the search direction d_k is a descent direction, and the stepsize α_k satisfies the strong Wolfe conditions (4) and (5), then

Proof. A contradiction is used to prove this theorem. Assume that formula (51) does not hold. Therefore, ∃ a constant r > 0 such that

‖-g_k‖-≥r, ∀k≥1 (52)

According to Theorem (3.2), it can be deduced that

We can write it as

Then, by (26) and (53), we get

From (20), we obtain

‖-y_k‖- = ‖-g_k+1−g_k‖-≤L‖-s_k‖-≤LD(55)

where D=max{‖-x−y‖-,x,y∈N} is the diameter of N and s_k=x_k+1−x_k.

We have

From (6), we obtain

By using (52) and (54), we get

The initial inequality is derived from the conditions 0<ζ_k,ξ_k,ω_k<1 and 0<1−ζ_k−ξ_k−ω_k<1. The second inequality uses the Cauchy-Schwarz inequality with (23). The final inequality utilizes (21), (52), and (55). Therefore, (3) and (50) imply that

imply that

However, from (48), (52), and (23), we obtain

This contradicts Theorem (4.1). Therefore, equation (52) is invalid, and the claim in (51) is proven.

5. THE NUMERICAL RESULTS

This section presents a numerical evaluation of the proposed hRH method compared to the established methods: DHF [23], HLSDY [20], HLSFR [19], and HDYCDHS [24]. The evaluations used 170 benchmark test problems obtained from [39-41] with different dimensions (n) ranging from 2 to 200,000. The measures adopted included the time required to solve the problem, the number of iterations, and the number of function and gradient evaluations. The implementation was performed using MATLAB (R2024b) on a personal computer with an Intel(R) Core (TM) i5-8250U CPU @ 1.60 GHz, 8.00 GB RAM, and a 64-bit Windows 11 Education operating system. The convergence criterion was set to ‖g_k‖-≤0⁻⁶ or when the maximum number of iterations reaches 2000. A comprehensive list of these test problems is provided in Table 2. All of these algorithms implement the strong Wolfe parameters and are fixed at δ = 0.0001 and σ = 0.1. The numerical comparisons were based on the performance profile framework proposed by Dolan and Moré [42]. In the analysis of the performance profiles, the upper curve indicates that the method is better. The effectiveness of the method is determined according to the performance metric P_s(τ), which measures the performance of the solver on a set of benchmark problems. Here, P_s(τ) denotes the proportion of problems solved by the solver with a performance ratio less than or equal to τ. Using this metric, we obtained the ratio of the solver’s capacity to properly assess how this optimization problems can be solved. Thus, efficient procedures must be considered and simplified. Taking P_s(τ) = P(τ), it is also taken that S = τ, The numerical results were compared graphically. Figs. 1-3 show the comparison results.

Fig. 1. Time performance for DHF, HLSDY, HLSFR, HDYCDHS, and hRH methods.

Fig. 2. Iteration count performance for DHF, HLSDY, HLSFR, HDYCDHS, and hRH methods.

Fig. 3. Objective function/gradient calculation performance for DHF, HLSDY, HLSFR, HDYCDHS, and hRH methods.

TABLE 1: List of test problems, dimensions, and initial points

In Fig. 1, we present the performance curves of the computational time. The vertical axis represents the fraction solved within a given ratio τ, while the horizontal axis represents the time ratio τ. When 0<τ≤0.5, hRH shows better performance and solves 77% of the problems in less time than the other methods, while HLSDY and HLSFR 70% and HDYCDHS and DHF 67% and 51%, respectively. With τ rising to τ≥0.5, hRH remains the most efficient at 95%,

While HLSDY takes the second place producing 89%, third place is taken by DHF and HLSDY 88%, and HLSFR in last place 87%. As expected, hRH is faster in terms of computational time than the other algorithms. Concurrently, if τ is between 0.5<τ≤1, our method solved 90% problems, and that of the second-best method, HDYCDHS solved 82% problems.

Fig. 2 shows the relative measures, which are the iteration counts of each algorithm for convergence. In 0<τ≤0.5, the hRH method demonstrated the highest resolution rate compared to the other methods, solving 92% of the problems with a smaller number of iterations than all the other methods, approximately 82%. Again, as τ≥0.5, the efficiency of the hRH method is 95% while for HDYCDHS, it is 89%, and for the other methods, it is 88%. Meanwhile, if τ is between 0.5<τ≤1, the proposed method solved 95%, the HDYCDHS method 87% and the other methods 84%. These results demonstrate that the proposed hRH algorithm achieves a better solution with fewer iterations, particularly in high-dimensional or challenging problem domains.

The performance profile of the number of function/gradient evaluations is depicted in Fig. 3. The relative performance of hRH within the range 0<τ≤0.5 is as follows: hRH solves 88% of the problems with fewer evaluations compared to HLSDY, HLSFR, and HDYCDHS solve 81% and DHF solves 76%. Up to τ≥1, the efficiency of hRH is 95%, that of the second-best method HDYCDHS is 89%, and that of the others is 87%. The results show that the proposed hRH algorithm requires fewer function/gradient evaluations than other algorithms, while significantly reducing the overall computational cost.

For the EXROSEN function, the proposed (hRH) method uniquely succeeded in solving the problem across the seven tested dimensions, whereas the comparative methods (DHF, HLSDY, HLSFR, and HDYCDHS) failed in all dimensions because of jamming, that is, generating many short steps without making significant progress toward the solution. The numerical simulation also confirmed the effectiveness of the hRH method for the investigated performance parameters. It can be observed that it always performs better than the other methods in terms of computational time, number of iterations, and number of function/gradient evaluations. These results demonstrate that the proposed hRH algorithm is stable and fast for optimizing unconstrained optimization problems.

6. IMAGE RESTORATION PROBLEMS

In this section, we evaluate the performance of the proposed hRH algorithm in restoring color images corrupted by impulse noise.

Following the spirit of [43], we treat high-density noise in digital images using a dual-phase framework. In the first phase, we used an adaptive median-based method to identify the corrupted pixels. Let us assume that Z is a color image of dimensions M×N and Ω = {1,…M}×{1,…N} is the pixel index domain. Therefore, we assume that the subset of noisy pixels, denoted by D, is a subset of and choose |D| to be the total number of noisy pixels. Every noisy pixel at location (i, j) has four immediate neighbors in the neighborhood W_i,j around it: {(i−1,j),(i+1,j),(i,j-1),(i,j+1)}.

We observe the noisy pixel value, denoted as z_ij, and the recovered value, denoted as v_i,j. The denoising process is formulated as a non-smooth optimization.

where the regularization terms are given by

The function acts as a smooth penalty term that preserves the edges and is characterized by a positive parameter δ. The vector v = [v_i,j]_(i,j)∈D is a lexicographically ordered list of unknowns. Directly minimizing (56) can be computationally expensive, particularly for high-resolution images with dense noise.

To resolve this issue, a smooth surrogate model was presented in [44], eliminating the non-smooth term and resulting in the following equivalent formulation:

With an increasing noise ratio, the number of corrupted pixels also increases, and the size and complexity of the problem (57) increase drastically. Nevertheless, as demonstrated in [44], CGMs remain efficient and scalable for this problem, even under severe corruption.

Nevertheless, color images require more attention than grayscale images because they are multichannel. Each of the three-color images contains three channels: Red, Green, and Blue (RGB), and there may be noise present in each of these channels differently. Therefore, the restoration process is much more complex. Therefore, we adopt a channel-wise restoration strategy where each RGB component is selectively restored independently using the same optimization procedure. To preserve reality in the final output, care is taken to ensure that the overall structure consists of what is present across channels. We show that this strategy can be used effectively to handle independently corrupted color channels without introducing artifacts in the reconstructed image.

We then applied this model to tackle salt-and-pepper noise, which is the most prevalent form of impulse corruption in color images. It was first operated using an adaptive median filter [45] to detect the noisy pixels across all color channels. The solution of (57) using our proposed hRH algorithm is compared in the reconstruction step to several classical CGMs: PRP, LS, and FR methods. We evaluated the performance on several widely used test images of size 768×512: Girl, Lighthouse, Hats, and Macaws. The termination condition is defined as:

The experimental platform matches the one described in section 5. Restoration quality is quantified using the Peak Signal-to-Noise Ratio (PSNR), computed as

where x^{r^_i,j} and x^{0^_i,j} denote the restored and original pixel values, respectively, and

The key metrics, including the iteration count, calculation time, and PSNR for all tested methods, are summarized in Table 3. We only provide visual results for noise levels of 70% and 90% in Figs. 4 and 5 to illustrate the original, noisy, and restored images, to conserve space.

Fig. 4. (a-x) Original images (first row), noisy images with 70% salt-and-pepper noise (second row), and restored results with hRH (row 3), PRP, (row 4), LS (row 5) and FR (row 6).

Fig. 5. (a-x) Original images (first row), noisy images with 90% salt-and-pepper noise (second row), and restored results with hRH (row 3), PRP, (row 4), LS (row 5) and FR (row 6).

TABLE 2: Image denoising results for different methods (hRH, PRP, LS, and FR)

The proposed hRH method demonstrates a marked advantage over the classical CG methods (PRP, LS, and FR) for image denoising. While the LS algorithm provides competitive PSNR values, our hRH method excels in high-noise scenarios, delivering superior or comparable image quality where it matters most. The most significant strength of hRH lies in its convergence efficiency, as it consistently requires a fraction of the iterations needed by the other methods. Although the FR method is frequently faster in terms of computation time, it suffers from poor accuracy. Conversely, PRP and LS are generally slower without consistently outperforming hRH in quality. Overall, the hRH method provides the best trade-off between accuracy and efficiency, proving to be the most robust and reliable algorithm tested, particularly in challenging denoising tasks.

7. CONCLUSION

In this study, a novel hybrid CG algorithm for unconstrained optimization is proposed and called β^hRH_k. The update parameter is a convex combination of the established CG formulas:

where the coefficients are selected such that they satisfy the Dai-Liao conjugacy condition. The proposed method guarantees descent directions by an inexact line search and demonstrates global convergence under certain general conditions. Computational testing of the hRH method on a range of benchmark problems showed good performance, which was superior to that of conventional CG methods. In addition, the practical feasibility of the proposed method was demonstrated by its successful application to image impulse denoising, where the images were restored clearly despite the preservation of the intrinsic structural information. The above results thus show the versatility and applicability of the proposed method to standard optimization problems and real-world image processing problems.

Acknowledgment

We sincerely thank the anonymous reviewers for their thoughtful comments and insightful suggestions, which helped us improve the clarity and quality of this study. We are also grateful to the editor for their kind support and careful handling of the review process.

REFERENCES

[1] W. W. Hager and H. Zhang. “A survey of nonlinear conjugate gradient methods“. Pacific Journal of Optimization, . 2, . 1, pp. 35-58, 2006.

[2] M. R. Hestenes and E. Stiefel. “Methods of Conjugate Gradients for Solving Linear Systems“. vol. 49. NBS, Washington, DC, 1952.

[3] H. A. Babando, M. Barma, S. Nasiru and S. Suraj. “A dai-liao hybrid prp and dy schemes for unconstrained optimization“. International Journal of Development Mathematics, . 1, . 1, pp. 41-53, 2024.

[4] P. Wolfe. “Convergence conditions for ascent methods“. SIAM Review, . 11, . 2, pp. 226-235, 1969.

[5] P. Wolfe. “Convergence conditions for ascent methods. II:Some corrections“. SIAM Review, . 13, . 2, pp. 185-188, 1971.

[6] R. Fletcher and C. M. Reeves. “Function minimization by conjugate gradients“. The Computer Journal, . 7, . 2, pp. 149-154, 1964.

[7] E. Polak and G. Ribiere. “Note sur la convergence de méthodes de directions conjuguées. Revue française d'informatique et de recherche opérationnelle“. Série Rouge, . 3, . 16, pp. 35-43, 1969.

[8] B. T. Polyak. “The conjugate gradient method in extremal problems“. USSR Computational Mathematics and Mathematical Physics, . 9, . 4, pp. 94-112, 1969.

[9] R. Fletcher. “Practical Methods of Optimization“. John Wiley and Sons, Hoboken, 2000.

[10] Y. Liu and C. Storey. “Efficient Generalized conjugate gradient algorithms, part 1:Theory“. Journal of Optimization Theory and Applications, . 69, pp. 129-137, 1991.

[11] Y. H. Dai and Y. Yuan. “A nonlinear conjugate gradient method with a strong global convergence property“. SIAM Journal on Optimization, . 10, . 1, pp. 177-182, 1999.

[12] J. Nocedal and S. J. Wright. “Numerical Optimization“. Springer, Berlin, 2006.

[13] N. Andrei. “Numerical comparison of conjugate gradient algorithms for unconstrained optimization“. Studies in Informatics and Control, . 16, . 4, pp. 333-352, 2007.

[14] M. J. D. Powell. Nonconvex Minimization Calculations and the Conjugate Gradient Method. In:“Numerical Analysis:Proceedings of the 10^th Biennial Conference Held at Dundee, Scotland, June 28-July 1, 1983“, Springer, pp. 122-141, 1984.

[15] Y. Dai, J. Han, G. Liu, D. Sun, H. Yin and Y. X. Yuan. “Convergence properties of nonlinear conjugate gradient methods“. SIAM Journal on Optimization, . 10, . 2, pp. 345-358, 2000.

[16] S. Babaie-Kafaki. “A hybrid conjugate gradient method based on a quadratic relaxation of the dai-yuan hybrid conjugate gradient parameter“. Optimization, . 62, . 7, pp. 929-941, 2013.

[17] N. Andrei. “Another nonlinear conjugate gradient algorithm for unconstrained optimization“. Optimization Methods and Software, . 24, . 1, pp. 89-104, 2009.

[18] S. S. Djordjević. “New hybrid conjugate gradient method as a convex combination of FR and PRP methods“. Filomat, . 30, . 11, pp. 3083-3100, 2016.

[19] S. S. Djordjević. “New hybrid conjugate gradient method as a convex combination of LS and FR methods“. Acta Mathematica Scientia, vol. 39. . 214-228, 2019.

[20] J. K. Liu and S. J. Li. “New hybrid conjugate gradient method for unconstrained optimization“. Applied Mathematics and Computation, . 245, pp. 36-43, 2014.

[21] S. B. Hanachi, B. Sellami and M. Belloufi. “New iterative conjugate gradient method for nonlinear unconstrained optimization“. RAIRO-Operations Research, . 56, . 4, pp. 2315-2327, 2022.

[22] S. B. Hanachi, B. Sellami and M. Belloufi. “A new family of hybrid conjugate gradient method for unconstrained optimization and its application to regression analysis“. RAIRO-Operations Research, . 58, no. 1. . 613-627, 2024.

[23] N. Salihu, M. R. Odekunle, A. M. Saleh, S. Salihu. “A dai-liao hybrid hestenes-stiefel and fletcher-revees methods for unconstrained optimization“. International Journal of Industrial Optimization, . 2, . 1, pp. 33-50, 2021.

[24] A. Hallal, M. Belloufi and B. Sellami. “An efficient new hybrid cg-method as convex combination of DY and CD and HS algorithms“. RAIRO-Operations Research, . 56, . 6, pp. 4047-4056, 2022.

[25] A. Perry. “A modified conjugate gradient algorithm“. Operations Research, . 26, . 6, pp. 1073-1078, 1978.

[26] Y. H. Dai and L. Z. Liao. “New conjugacy conditions and related nonlinear conjugate gradient methods“. Applied Mathematics and Optimization, . 43, pp. 87-101, 2001.

[27] A. B. Abubakar and P. Kumam. “A descent dai-liao conjugate gradient method for nonlinear equations“. Numerical Algorithms, . 81, pp. 197-210, 2019.

[28] N. Andrei. “A dai-liao conjugate gradient algorithm with clustering of eigenvalues“. Numerical Algorithms, . 77, . 4, pp. 1273-1282, 2018.

[29] S. Babaie-Kafaki. “On optimality of two adaptive choices for the parameter of dai-liao method“. Optimization Letters, . 10, pp. 1789-1797, 2016.

[30] S. Babaie-Kafaki and R. Ghanbari. “The dai-liao nonlinear conjugate gradient method with optimal parameter choices“. European Journal of Operational Research, . 234, . 3, pp. 625-630, 2014.

[31] S. Babaie-Kafaki and R. Ghanbari. “A descent family of dai-liao conjugate gradient methods“. Optimization Methods and Software, . 29, . 3, pp. 583-591, 2014.

[32] S. Babaie-Kafaki and R. Ghanbari. “Two optimal dai-liao conjugate gradient methods“. Optimization, . 64, . 11, pp. 2277-2287, 2015.

[33] S. Babaie-Kafaki and R. Ghanbari. “Two adaptive dai-liao nonlinear conjugate gradient methods“. Iranian Journal of Science and Technology, Transactions A:Science, 42, pp. 1505-1509, 2018.

[34] B. Ivanov, G. V. Milovanovićand P. S. Stanimirović. “Accelerated dai-liao projection method for solving systems of monotone nonlinear equations with application to image deblurring“. Journal of Global Optimization, . 85, . 2, pp. 377-420, 2023.

[35] M. Y. Waziri, K. Ahmed and J. Sabi'u. “A dai-liao conjugate gradient method via modified secant equation for system of nonlinear equations“. Arabian Journal of Mathematics, . 9, pp. 443-457, 2020.

[36] A. B. Abubakar, P. Kumam, M. Malik and A. H. Ibrahim. “A hybrid conjugate gradient based approach for solving unconstrained optimization and motion control problems“. Mathematics and Computers in Simulation, . 201, pp. 640-657, 2022.

[37] A. B. Abubakar, P. Kumam, M. Malik, P. Chaipunya and A. H. Ibrahim. “A hybrid FR-DY conjugate gradient algorithm for unconstrained optimization with application in portfolio selection“. AIMS Mathematics, . 6, . 6, pp. 6506-6527, 2021.

[38] G. Zoutendijk. “Nonlinear programming, computational methods“. Integer and Nonlinear Programming“, pp. 37-86, 1970.

[39] N. Andrei. “An unconstrained optimization test functions collection“. Advanced Modeling and Optimization, . 10, . 1, pp. 147-161, 2008.

[40] N. I. M, Gould, D. Orban and P. L. Toint. “Cuter and sifdec:A constrained and unconstrained testing environment, revisited“. ACM Transactions on Mathematical Software, . 29, . 4, pp. 373-394, 2003.

[41] J. J. Moré, B. S. Garbow and K. E. Hillstrom. “Testing unconstrained optimization software“. ACM Transactions on Mathematical Software, . 7, . 1, pp. 17-41, 1981.

[42] E. D. Dolan and J. J. Moré. “Benchmarking optimization software with performance profiles“. Mathematical Programming, . 91, pp. 201-213, 2002.

[43] R. H. Chan, C. W. Ho and M. Nikolova. “Salt-and-pepper noise removal by median-type noise detectors and detail-preserving regularization“. IEEE Transactions on Image Processing, . 14, . 10, pp. 1479-1485, 2005.

[44] J. F. Cai, R. Chan and B. Morini. Minimization of an Edge-Preserving Regularization Functional by Conjugate Gradient Type Methods. In:“Image Processing Based on Partial Differential Equations:Proceedings of the International Conference on PDEBased Image Processing and Related Inverse Problems“, CMA, Oslo, Springer, pp. 109-122, 2007.

[45] H. Hwang and R. A. Haddad. “Adaptive median filters:New algorithms and results“. IEEE Transactions on Image Processing, . 4, . 4, pp. 499-502, 1995.