Connections between Operator-Splitting Methods and Deep Neural Networks with Applications in Image Segmentation

2023-12-10 06:53HaoLiuXueChengTaiandRaymondChan
Annals of Applied Mathematics 2023年4期

Hao Liu ,Xue-Cheng Tai and Raymond Chan

1 Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China

2 Norwegian Research Centre (NORCE), Nyg˚ardsgaten 112, 5008 Bergen,Norway

3 Department of Mathematics, City University of Hong Kong and Hong Kong Center for Cerebro-Cardiovascular Health Engineering, Hong Kong, China

Abstract.Deep neural network is a powerful tool for many tasks.Understanding why it is so successful and providing a mathematical explanation is an important problem and has been one popular research direction in past years.In the literature of mathematical analysis of deep neural networks,a lot of works is dedicated to establishing representation theories.How to make connections between deep neural networks and mathematical algorithms is still under development.In this paper,we give an algorithmic explanation for deep neural networks,especially in their connections with operator splitting.We show that with certain splitting strategies,operator-splitting methods have the same structure as networks.Utilizing this connection and the Potts model for image segmentation,two networks inspired by operator-splitting methods are proposed.The two networks are essentially two operator-splitting algorithms solving the Potts model.Numerical experiments are presented to demonstrate the effectiveness of the proposed networks.

Key words: Potts model,operator splitting,deep neural network,image segmentation.

1 Introduction

In past decades,deep neural network has emerged as a very successful technique for various fields.It has demonstrated impressive performances in many tasks,such as image processing,object detection,and natural language processing.In some tasks,deep neural networks even outperform humans.

Due to great successes of neural networks,over the past several years,a lot of works has been devoted to the mathematical understanding of neural networks and explaining their success.Representation theories for function learning are studied in[4,17,38,46,66]for feedforward neural networks,in[70,71]for convolutional neural networks,and in[36,37,54]for convolutional residual networks.Recently,theoretical results for learning operators are developed in [5,33,44].The works mentioned above show that as long as the network depth and width are sufficiently large,deep neural networks can approximate any function or operator within a certain class to arbitrary accuracy.These works focus on the existence of a good approximator with desired approximation error and use techniques from approximation theory.Recently,analysis of neural ordinary differential equations from the perspective of optimal control was conducted in [58].In this paper,we investigate the power of neural networks from another perspective: the network structure.We will give an algorithmic explanation of neural networks,especially in their connections with operator-splitting algorithms.

The operator-splitting method is a class of powerful methods for numerically solving complicated problems.The general idea is to decompose a difficult problem into several subproblems which will be solved sequentially or simultaneously.Operator-splitting methods have been widely used on solving partial differential equations [24,26,47,49,64],image processing [21,22,40,41],surface reconstruction [30],obstacle problem [43],inverse problem [25],and computational fluid dynamics [6,52],etc.We refer readers to [28,48] for some survey discussions.

Image segmentation is an important subject in many fields,such as medical imaging and object detection.Many mathematical models and algorithms have been developed for image segmentation [10,11,13–15,31,53,63].In [2,68],the segmentation problem is formulated as a min cut or max flow problem.One important model for image segmentation is the Potts model,which was first proposed for statistical mechanics [56].In fact,the well-known Chan-Vese model [15] is a special case of the Potts model.In [65],detailed explanations are given to show that the Potts model is equivalent to a continuous min cut and max flow problem [65].Efficient algorithms for the Potts model are studied in [60,68].We suggest readers to survey [61] for a comprehensive discussion on the Potts model.Recently,many deep learning methods for image segmentation are also proposed [23,57,72].

Following [62] and [39],we focus on the building block of neural networks and operator-splitting methods and make connections between them.We first introduce the structure of a standard neural network in this work.Then we discuss popular operator-splitting strategies: sequential splitting and parallel splitting.We show that for certain splitting strategies,the resulting splitting algorithm is equivalent to a neural network,whose depth and width are determined by the splitting strategy.Such a connection is also observed and utilized in [32],which is used to solve nonlinear partial differential equations by neural networks.We will apply this connection to the Potts model for image segmentation,and propose two networks inspired by operator-splitting methods.As the proposed networks are derived from the Potts model,they contain explicit regularizers that have physical meaning.The effectiveness of the proposed networks are demonstrated by numerical experiments.

This paper is structured as follows: We briefly introduce deep neural networks and operator-splitting methods in Section 2 and 3,respectively.The Potts model for image segmentation is discussed in Section 4.We present the two networks inspired by operator-splitting methods in Sections 5 and 6,respectively.This paper is concluded in Section 7.

2 Deep neural networks

Deep neural networks have been successfully applied to many problems and have achieved remarkable performances.There are many networks architectures,such as feedforward neural networks (FNN) [20],convolutional neural networks (CNN) [34]and residual neural networks (ResNet) [29].It is easy to show that any CNN and ResNet can be realized by FNN [37,54,70,71].In this paper,we focus on FNN and show their connections with operator-splitting methods.

The building blocks of FNNs are layers.An FNN is a composition of multiple layers,each of which takes an input x∈Rdand outputs a vector infor some integersd,d′>0.Each layer has the form of

whereW ∈is a weight matrix,b∈is a bias term,andσ(·)is the activation function.Popular choices ofσinclude the rectified linear unit (ReLU),sigmoid function and tanh function.The computation of a layer consists of two parts,a linear partWx+b and a nonlinear partσ(·).These two parts are conducted sequentially.Later we will show the similarity between this structure and operator splitting methods.

An FNN with depthLis a composition ofL-1 layers followed by a fully connected layer:

wherebl ∈Rdlandσldenote the weight matrix,bias and activation function in thel-th layer withd0=dbeing the input dimension anddLbeing the output dimension.We call maxldlthe width off.The computation off(x) can be taken as passing the input x toLlayers sequentially.

3 Operator-splitting methods

The operator-splitting method is a powerful method for solving complicated problems,such as time evolution problems and optimization problems.The general idea is to decompose a complicated problem into several easy-to-solve subproblems,so that each subproblem can be solved explicitly or efficiently.The first operatorsplitting method according to [19] is the famous Lie scheme introduced in [35] to solve the initial problem from the dynamical system:

whereX0∈Rdfor some integerd >0,andA,B ∈Rd×d.In the Lie scheme,one solves (3.1) by decomposingAandBinto two subproblems.In each subproblem,Xis governed by one operator and evolves for a small time step.One can show that this scheme is first-order accurate in time.Later,a second order splitting scheme,the Strang scheme,was introduced in [59].We refer this type of splitting scheme as sequential scheme as the subproblems are solved sequentially.Another type of splitting strategy is parallel splitting [47],which solves the subproblems simultaneously and then takes the average.Such a method is proved to be first-order accurate in time.Due to the simplicity of the ideas and versatility,operator-splitting methods have been widely used in solving partial differential equations[6,9,26,47,49]in which the operators in the PDE are decomposed into subproblems.

The operator-splitting method is also a popular choice to solve optimization problems.Consider an optimization problem in which one needs to minimize a functional.One may first derive the optimality condition of the minimizer and associate it with an initial value problem in the flavor of gradient flow.Then solving the optimization problem is converted into finding the steady-state solution of the initial value problem.The initial value problem can be solved by operator-splitting methods.Such a strategy has been used in[21,22,30,41,42].For optimization problems,the alternating direction method of multipliers(ADMM)has been extensively studied in past decades [1,60,67,68].Indeed,ADMM is a special type of operatorsplitting methods.We suggest readers to [27,28] for a comprehensive discussion on operator-splitting methods.

In the rest of this section,we focus on the following initial value problem

whereK >0 is a positive integer,Ω is our computational domain,Akuis a linear operator ofu,Bk(u) is an operator onuthat might be nonlinear,andgk’s are functions defined on Ω.We will discuss sequential splitting and parallel splitting schemes for solving (3.2),and show their connections to deep neural networks.In the following,assume the time domain is discretized intoNsubintervals.Denote Δt=T/Nandtn=nΔt.We useunto denote our numerical solution attn.

3.1 Sequential splitting

For sequential splitting,a simple one is the Lie scheme.Givenun,we can updateun+1byKsubsteps:Fork=1,···,K,solve

By time-discretizing (3.4) by the forward Euler method,we get

whereIdenotes the identity operator:Iu=u.

By time-discretizing (3.5) using the backward Euler method,we get

Denoting the resolvant operator forbyρk,i,e.,ρk=(I-ΔtBk)-1,we have

Connections to FNN.Comparing (3.9) and (2.1),we see that they have the same structure.By choosing

andWK+1=I,bK+1=0,the sequential splitting scheme is an FNN withK+1 layers,where thek-th substep corresponds to thek-th layer.In other words,any FNN withWK+1=I,bK+1=0 and activation functionsρk’s is a sequential splitting scheme solving some initial value problem.

3.2 Parallel splitting

Using parallel splitting [47],we solve (3.2) by first solvingKparallel substeps:Fork=1,···,K,solve

and setun+1,k=v(tn+1).Then compute

One can show that whenBk’s are linear operators and(3.11)is solved by the forward Euler method,the parallel splitting scheme is first-order accurate in time.Note that the number of operatorsKis multiplied into the right hand side of (3.11).

Similar to (3.4)–(3.5),we can use the same idea to solve (3.11).After some derivations,we get

Connections to FNN.Again,(3.13) and (2.1) have the same structure.SetL=2,

whereσ1is applied elementwise:

then the parallel splitting scheme is an FNN with 2 layers.In other words,any 2-layer FNN withσ1,W2,b2given in the format of (3.15) is a parallel splitting scheme solving some initial value problems.

4 Potts model and image segmentation

We next focus on image segmentation and utilize the relations between operatorsplitting methods and FNNs to derive new networks.

We start with the Potts model which has close relations to a large class of image segmentation models.Let Ω be the image domain.The continuous two-phase Potts model is given as [3,7,8,12,69]

where Per(Σk)is the perimeter of Σk,andfk’s are nonnegative weight functions.In(4.1),the first term is a data fidelity term which depends on the input imagef.The second term is a regularization term which penalizes the perimeter of the segmented region.

A popular choice offkis

whereckis the estimated mean density off(x)onHere we useto denote the‘optimal’ segmentation region.With this choice and the use of a binary functionvto represent the foreground of the segmentation results,we rewrite the two-phase Potts model as

for some constantλ>0.Model (4.2) is the well-known Chan-Vese model [16].

The first integral in (4.2) is linear inv.As one can takeckandas functions depending onf,in this case,fk’s in (4.1) are functions of the input imagefonly.We can thus rewrite the Potts model as

whereF(f) is a region force depending on the input imagef.In the following sections,we will discuss two relaxations of the Potts model(4.3)and correspondingly propose two new networks for image segmentation.

5 Operator-splitting method inspired networks for image segmentation: Model I

Our first model utilizes (4.3) and the double-well potential and follows [39].

5.1 Model formulation

Model (4.3) requires the functionvto be binary.One relaxation of (4.3) using the double-well potential is

It is shown in[50,51]thatLε(v)converges toCPer(Σ1)in the sense of Γ-convergence asε→0 for some constantC.

Denote the minimizer of (5.1) byu.It satisfies the optimality condition

The gradient flow equation for minimization problem (5.1) is:

for some initial conditionG(f),which is a function off.Then solving (5.2) is equivalent to finding the steady state solution of(5.3).As was done in[62],We add control variables to (5.3) so that we can use these control variables to lead the final stateu(T) of the following equation to some desired targets:

Normally,the functionF(f)is a complicated function off.We use a subnetwork to representF(·).As usual,here and later* denote the spatial convolution operator.

5.2 An operator splitting method to solve (5.4)

We use the Lie scheme to solve (5.4).Still discretize the time variable as in Section 3.Givenun,we updateun →un+1/2→un+1in the following way:

Substep 1:Solve

and setun+1/2=v(tn+1).

Substep 2:Solve

and setun+1=v(tn+1).

5.3 Time discretization

We use a one-step forward Euler scheme to time-discretize (5.5) and a one-step backward Euler scheme to time-discretize (5.6):

In image processing,the periodic boundary condition is widely used.Here we replace the Neumann boundary condition in (5.7a) by the periodic one.In (5.7a),indexnis used at the superscript ofWnandbnto differentiate control variables at different time.

In our algorithm,we chooseG(f) as a convolution layer off,i.e.,a convolution offwith a 3×3 kernel followed by a sigmoid function.Suppose we are given a set of imagesand the corresponding segmentation masksNote that bothFandGare networks and contain learnable parameters.Denote the set of control variables byθ1=,θF,θG},whereθF,θGdenote the all network parameters inFandG.Denote the procedure of numerically solving (6.8) withNtime steps and control variablesθ1by

We will learnθ1by solving

whereℓ(·,·) is some loss function,such as the cross entropy.

5.4 Connections to neural networks

The building block for scheme (5.5)–(5.6) consists of a linear step (5.7a) and a nonlinear step (5.7a).Thus a time stepping ofuncorresponds to a layer of a network with (5.7b) being the activation function.Unlike the commonly used neural networks,the equivalent network of Model I has a heavy bias termF(f) which is chosen as a subnetwork in this paper.The process (5.8) of learningθ1is the same as training a network.

5.5 Numerical experiments

We demonstrate the effectiveness of Mode I.In our experiments,we chooseG(f)as a convolution layer off.The functionalF(f) needs to be complicated enough to approximate the probability function in the Potts model.Here we setF(f) as a UNet [57] subnetwork off,i.e.,Fhas the same architecture as a UNet,takesfas the input and ouputs a matrix having the same size asuwith elements in[0,1],see [39] for details.TheWn’s andbn’s are convolution kernels and biases,respectively.For parameters,we set Δt=0.2,λε=1,λ/ε=15.

We use the MSRA10K dataset [18] which contains 10000 salient object images with manually annotated masks.We choose 2500 images for training and 400 images for testing.All images and masks are resized to 192×256.

We compare Model I with UNet[57],UNet++[72]and MANet[23].All models are trained with 400 epochs.The comparison of accuracy and the dice score are shown in Table 1.We observe that Model I has higher accuracy and dice score than other networks.Some segmentation results are presented in Fig.1.Model I successfully segments the targets from images with complicated structures.

Figure 1: Some results by Model I.

Table 1: Comparison of Model I with other networks in terms of accuracy and dice score.

6 Operator-splitting method inspired networks for image segmentation: Model II

As in[62],our second model incorporates(4.3)and the threshold dynamics ideas in approximating the perimeter of a region.

6.1 Model formulation

In (4.3),whenvis binary,|∇v|dx gives the perimeter of Ω1,the support ofv.Replacing|∇v|dx by Per(Ω1) in (4.3),we get

Then we use a threshold dynamics idea to approximate the perimeter term:

whereGδis the two-dimensional Gaussian filter (the covariance matrix of x being an identity matrix)

It is shown in [55] that the approximation in (6.2) Γ–converges to Per(Ω1) asδ →0.Replacing Per(Ω1) by the approximation given in (6.2),the functional we are minimizing becomes

Model(6.4)requiresvto be binary.We relax this constraint tov∈[0,1]and consider the following functional instead

for some smallε >0.Such an approximate is a smoothed version of the binary constraint,and it converges to the original problem (6.4) asε→0,c.f.[45].

Ifuis a minimizer of (6.5),it satisfies the optimality condition

The gradient flow equation for this problem is:

We then introduce control variables to (6.6) and solve

for some initial conditionG(f) which is a function applied onf.Here we used a different notation from Section 5 to differentiate the different control variables.

Unlike Model I which treatsF(f) andb(t) as two functions,we use a different strategy to treat Model II.Since the termF(f) andgare just constant terms with respect tou,we can combine them and denote them bygwhich depends onf.The new problem is written as

6.2 An operator splitting method to solve (6.8)

We decompose the operatorAand the functiongas a sum ofKterms for some positive integerK:

for some operatorsAk’s and functionsgk’s.We then use a Lie scheme to solve(6.8).Givenun,we updateun →···→→···→un+1viaKsubsteps:

· Fork=1,···,K-1,solve

and set=v(tn+1).

· Fork=K,solve

and setun+1=v(tn+1).

For (6.10),we further apply a Lie scheme to decompose it into two substeps:

Substep 1:Solve

Substep 2:Solve

and set=v(tn+1).

Similarly,we solve (6.11) by the following two substeps:

Substep 1:Solve

Substep 2:Solve

and setun+1=v(tn+1).

6.3 Time discretization

Problem (6.12)–(6.15) are only semi-constructive as we still need to solve these initial value problems.In our algorithm,we time-discretize (6.12) and (6.14) by the forward Euler method and discretize (6.13) and (6.15) by the backward Euler method.The discretized scheme is given as follows: For(6.10),we use the following time discretization to solve it

For (6.11),we use the following time discretization to solve it

We will learnθ2by solving

whereℓ(·,·) is some loss function,such as the cross entropy.

6.4 Connections to neural networks

The building block of scheme (6.10)–(6.11) is (6.16a)–(6.17b).Note that the first substep(6.16a)(resp.(6.17a))is a linear step in(resp.).The second step(6.16b)(resp.(6.17b))is a nonlinear step.Thus this procedure is the same as a layer of a neural network,which consists a linear step and a nonlinear step (activation step).The numerical scheme for (6.10)–(6.11) that mapsf →u0→u1→···→uNis a neural network withKNlayers and activation functions specified in (6.16b) and(6.17b).The process (6.18) of learningθ2is the same as training a network.

6.5 Numerical experiments

In this section,we demonstrate the robustness of Model II against noise.We show that one trained Model II can provide good segmentation results on images with various levels of noise.In our experiments,we chooseG(f) as a convolution layer off.We setas learnable convolution kernels at different scales that extract various image features.We setas the convolution layers that are applied tof.For parameters,we use Δt=0.5,ε=2,λ=80.

Figure 2: Results by Model II.

We use the MSRA10K data again while resizing all images to a size of 192×256.In our training,we train our model on images with noise standard deviation (SD)1.0.This noise is big.Many existing algorithms cannot handle so big amount of noise.Some segmentation results are presented in Fig.2.For various levels of noise(even with very large noise),the trained model segments the target well.Refer to [62] for more experiments and explanations.

7 Conclusions

In this paper,the relations between deep neural networks and operator-splitting methods are discussed,and two new networks inspired by the operator-splitting method are proposed for image segmentation.The two proposed networks are derived from the Potts model,with certain terms having physical meanings as regularizers.Essentially,the two networks are operator-splitting algorithms solving the Potts model.The effectiveness of the proposed networks is demonstrated by numerical experiments.

Acknowledgements

The work of Hao Liu is partially supported by HKBU 179356,NSFC 12201530 and HKRGC ECS 22302123.The work of Xue-Cheng Tai is partially supported by NSFC/RGC grant N-HKBU214-19 and NORCE Kompetanseoppbygging program.The work of Raymond Chan is partially supported by HKRGC GRF grants CityU1101120,CityU11309922,CRF grant C1013-21GF,and HKRGC-NSFC Grant N_CityU214/19.