Ensemble Bayesian method for parameter distribution inference:application to reactor physics

2024-01-10 10:49JiaQinZengHaiXiangZhangHeLinGongYingTingLuo
Nuclear Science and Techniques 2023年12期

Jia-Qin Zeng · Hai-Xiang Zhang · He-Lin Gong · Ying-Ting Luo

Abstract The estimation of model parameters is an important subject in engineering.In this area of work, the prevailing approach is to estimate or calculate these as deterministic parameters.In this study, we consider the model parameters from the perspective of random variables and describe the general form of the parameter distribution inference problem.Under this framework, we propose an ensemble Bayesian method by introducing Bayesian inference and the Markov chain Monte Carlo(MCMC) method.Experiments on a finite cylindrical reactor and a 2D IAEA benchmark problem show that the proposed method converges quickly and can estimate parameters effectively, even for several correlated parameters simultaneously.Our experiments include cases of engineering software calls, demonstrating that the method can be applied to engineering,such as nuclear reactor engineering.

Keywords Model parameters · Bayesian inference · Frequency distribution · Ensemble Bayesian method · KL divergence

1 Introduction

With the depth of its intellectual development, mathematical modeling and simulation (M &S) have emerged as a powerful tool that promises to revolutionize engineering and science.M &S are expected to describe a system, on the basis of which further research can be conducted.For example,the Consortium for the Advanced Simulation of Light Water Reactors (CASL) was established by the US Department of Energy (DOE) in 2010, with the goal of providing M &S capabilities that support and accelerate the improvement of nuclear energy’s economic competitiveness and ensure nuclear safety [1].

Numerical simulations can differ significantly from experimental observations, and minimizing this difference has always been an important topic in engineering practice[2–4].The difference between model and reality is particularly true in reactor physics, primarily for the following reasons:

(1) The complex nature of nuclear phenomena.A nuclear system may involve neutron transport, thermal hydraulics, and fuel performance, among others, which makes it difficult to accurately model the neutronic behavior.Theoretically, this can be well described by the Boltzmann transport equation, incorporating coefficients derived from various experimentally evaluated neutron interaction cross sections.These cross sections have strong, discontinuous behavior in space due to material heterogeneities and extreme variations with energy due to resonance phenomena associated with compound nucleus formation [5–7].

(2) Inaccuracy of input parameters.A model requires input design data or explanatory data, such as geometry,composition, and control parameters, and input model parameters, including physical constants such as interaction cross sections and material thermal and mechanical properties.Particularly, thetwo-levelapproach(i.e., amicro-levelmodel for determining macro cross sections and amacro-levelmodel for determining the final response of interests) for reactor physics calculation may lead inexact input parameters [5–7].As a consequence, inexact parameters will clearly cause inaccurate outputs.

With the advancement of science and computational power,we can probably make fewer simplifications and approximations in our models, resulting to reduced differences,whereas the resources spent must still be considered for economic reasons.As we reduce this difference, it becomes increasingly dominated by the input parameters.

In particular, nuclear data, such as neutron cross sections and resonance parameters, are evaluated by combining approximated nuclear physics and experimental observations, which have epistemic and aleatoric uncertainties.These uncertainties then propagate through a neutronics model, creating differences and uncertainties [8].It is easy to imagine that it would be the best option if we were able to re-evaluate the nuclear data.Unfortunately, the nuclear evaluation process is significantly long and costly, making it unrealistic.For example, [9] estimated the cost of a single new observation of a nuclear datum at 400,000 US dollars.

The uncertainty of the input parameters causes uncertain behavior in the model, resulting to the development of three fields in statistics: uncertainty quantification (UQ), sensitivity analysis (SA), and data assimilation (DA) [10].They are beneficial in engineering design, with the potential to significantly reduce unnecessary costs.For example, knowledge of the sensitivity of attributes has been used to reduce costly design margins.

Data assimilation assimilates previous experience or experimental observations to reduce the differences and uncertainties in model predictions.It was first proposed in the field of meteorology and is widely used to predict meteorological phenomena, such as hurricanes [11].Data assimilation methods can be roughly divided into two categories: variational data assimilation [12–14] and sequential data assimilation [15–17], whereas some hybrid data assimilation techniques have emerged [18–20].Bayesian approaches to data assimilation were established after [21,22] and [23] provided an overview of the Bayesian perspective, discussing some approaches from this perspective.In neutronics, data assimilation is primarily used to predict the bias of critical systems [24], thereby reducing the uncertainty of advanced reactor designs [25].In recent years, research interest in data assimilation in neutrons has mainly focused on the adjustment of nuclear data, which could establish cross-correlations between nuclear data or nuclides that are traditionally unavailable [26, 27].In addition, new methods based on stochastic sampling of input parameters have been proposed, such as MOCABA [28] and BMC [29].In addition, D.Siefman showed that these two methods yield similar results to each other, as well as the traditional method known as generalized linear least squares(GLLS) [30].Readers can refer to [31–33] for the Bayesian applications in thermal hydraulics.

The method proposed in this study involves sampling the probability distribution, and the method we used is MCMC.The MCMC method began with the Metropolis method proposed by Metropolis et al.in 1953 and was extended by W.K.Hastings in 1970.In 1984, S.Geman et al.demonstrated how the method, known as the Gibbs sampler, can be adapted to high-dimensional problems that arise in Bayesian statistics [34].In 1987, S.Duane et al.combined MCMC and molecular dynamics methods, known as the hybrid Monte Carlo method or Hamiltonian Monte Carlo (HMC)[35].Later, scholars developed a series of improved HMC,such as the No-U-Turn Sampler (NUTS) [36] and stochastic gradient Langevin dynamics (SGLD) [37], based on HMC.

Although the Bayesian approach has been widely used in data assimilation, there is still a lack of research on the selection of a prior distribution, which is generally selected as the probability representation that exists before experimental observations, and is probably unsatisfactory.It is evident that the prior distribution affects the results of the method.This study aims to complete this task, namely, to provide the initial input of prior distributions for classical data assimilation methods.As an example, recent work [38] has applied the Bayesian neural network method to predict the beta decay lifetime of atomic nuclei and their uncertainty, which improves learning accuracy and uses prior distributions of parameters; therefore, it may be useful to consider using the method proposed in this work once before the algorithm to improve the performance of the method with more accurate prior distributions.Moreover, most data assimilation methods only provide point estimations for parameters; however, what if nuclear data are actually probabilistic? In fact,certain parameters in our understanding resemble distributions or random variables, such as the number of neutrons released per fission and the change in the concentration of nuclear fuel over a short period due to fuel depletion.We attempt to fill this research gap and take the first step in this distribution inference.Therefore, we propose an ensemble Bayesian method.It is also worth mentioning the following:

(1) The proposed method is simulation-based.As strategies for the design and evaluation of complex nuclear systems have shifted from heavy reliance on expensive experimental validation to highly accurate numerical simulations, more attention has been paid to simulation-based approaches.This allows our method to play a role in guiding the overall system design as well as optimizing the setup of the validating benchmark experiments.

(2) The proposed method has good compatibility.Because there are numerous legacy codes used extensively in the design process, they are expected to persist as integral parts of nuclear design calculations.Our method can be incorporated in the legacy codes, making it tractable for nuclear systems simulation.

The remainder of this paper is organized as follows.In Sect.2, we describe the parameter distribution inference problem in a general manner and describe the corresponding experiment.In Sect.3, we first introduce the Bayesian inference and Metropolis–Hastings algorithms, followed by the core algorithm.In Sect.4, we analyze the convergence of the method and illustrate a series of numerical tests on the proposed method, including a finite cylindrical reactor and the 2D IAEA benchmark problem.Finally, we provide a brief conclusion in the last section.

2 Problem description

When modeling a physical phenomenon, the first step is to introduce reliable theories, such as conservation laws and transport theory.In nuclear reactor physics, there are many classic models such as the finite cylindrical reactor and the 2D IAEA benchmark problem, which will also be presented in Sect.4.However, it is worth emphasizing that the method proposed in this study is valid, regardless of the theories introduced.For clarity, we skip this step and consider the following general system:

wherex∈D ⊂R3is a spatial vector,y∈R is an observable physical quantity,z∈R is an observation ofypossibly with noise∊, andμ=(μ(1),…,μ(n))Tare input parameters of the model.

frepresents the intrinsic functional relationship between these variables, which models the corresponding physical phenomenon.∊∼N(0,σ2) is the zero-mean Gaussian relative noise that models the observation noise.

2.1 Perspective of random variables

In current research, most data assimilation methods only provide point estimations for the model parametersμ.However, certain parameters in our understanding resemble distributions or random variables, such as the number of neutrons released per fission and the change in the concentration of nuclear fuel over a short period due to fuel depletion.Here, we deal with the latter case.Thus,μ=μ(θ)=(μ(1)(θ),…,μ(n)(θ))Tis assumed to be ann-dimensional parameter vector dependent on another parameterθ.The general parameterθ∈[a,b] might be timetor burnupBu.We have the following theorem [39]: Letθ∼U[a,b] andμ∶R →Rnbe a Borel measurable map;then,μ(θ) is an n-dimensional random vector.

In the following discussion, we are concerned with the overall statistical regularity of the parameterμwithin the interval [a,b] and will always treatμas a random vector with an unknown real joint distribution.Our goal is to estimate this distribution using observational data.

2.2 Experimental setup

To achieve our goal, a special type of experimental setup was designed to obtain the observational data required by the method proposed in the next section.Specifically, we obtained data in the following manner:

(i) Choosemnumber of spatial positions:x1,…,xm∈Dand setS={x1,…,xm} to be the set of sensor nodes.

(ii) Repeatedly sampleknumber ofθ:a≤θ1<…<θk≤bin the uniform distributionU(a,b).As an example, ifθmeans timet,θ1,…,θkare the observation times.

(iii) At eachθj,mnumber of observationsz1,j,…,zm,jare obtained through the sensors.

Because we usedmsensors, we obtained the followingmobservation equations:

where∊iis the zero-mean Gaussian relative noise of theith sensor.It is natural to assume that they are independent and have a common distribution, i.e.,∊i∼N(0,σ2),∀i=1,…,m.

After the previous steps, we obtainm×kobservationszi(xi,μ(θj)),i=1,…,m;j=1,…,k.For simplicity, we use the notation

3 Ensemble Bayesian method

The Bayesian approach has been widely used in data assimilation and is the method proposed in this study.Subsequently, we provide an overview of the Bayesian inference.

3.1 Bayesian inference

LetXandYbe the quantities of interest that cannot and can be directly observed, respectively, of which the observations make up our data.From the perspective of Bayesian inference, there exists a joint probability distribution of all unobserved and observable quantities denoted byp(x,y).Bayesian inference determines the conditional distribution of the unobserved quantities of interest given the observation data.This is formally accomplished by applying the Bayesian formula:

provided 0

We obtain the formula for the posterior distribution,which we will sample later.Some commonly used sampling algorithms include adaptive rejection sampling.However,in many cases, there is no analytical solution for the posterior formula, which makes it difficult to use many sampling algorithms.To overcome this problem, we introduce the MCMC method, which is a technique for sampling probability distributions.Here, we consider the Metropolis–Hasting algorithm, which has unique advantages for posterior distribution sampling.

3.2 Metropolis–Hastings algorithm

The Classical Metropolis–Hastings algorithm [40] was first proposed in the early 1950 s and extended by W.K.Hastings in 1970.As the posterior distribution that we derive later is continuous, the algorithm for the continuous state space is given below.

Letπbe a probability density function that must be sampled andπsbe the value ofπins.LetTbe a transition function for any irreducible Markov chain with the same state space asπandTs,tbe the value of a conditional density function intgiven conditions.Furthermore, we assumed that we knew how to sample fromT.ChainTis used as a proposal chain, generating the elements of a sequence that the algorithm decides whether to accept.The steps of the Metropolis–Hastings algorithm are presented in Algorithm 1.

For the Metropolis–Hastings proposal distribution, if we choose a symmetric distribution with the current state as the symmetric point, such as a uniform distribution on an interval of length two centered at the current state, or a Gaussian distribution with the expectation of the current state, we can simplify Eq.(1) to:

becauseTμ′,μu-1=Tμu-1,μ′.This is the main reason why this algorithm works well with a posterior distribution.Next, we use the Gaussian distribution with the expectation of the current state as the proposal distribution.

Whenπis the joint probability density function of multivariate random variables, the proposed chainTfrom which we know how to sample, is difficult to obtain.In this case,the single-component Metropolis–Hastings algorithm is used.We denoteas a one-dimensional proposal distribution given conditions, satisfying

Algorithm 1 Metropolis–Hastings algorithm

Algorithm 2 Single-component Metropolis–Hastings algorithm

In reality, we can directly choose a one-dimensional distribution from which we know how to sample, such as uniform or Gaussian distributions, as a one-dimensional proposal distribution.This avoids the difficulty of sampling from a multidimensional distribution.

3.3 Ensemble Bayesian algorithm

In this section, we derive the core method combined with the experimental setup (observation data) in Sect.2.2, which includes three steps: Bayesian inference, Metropolis–Hastings sampling, and performance evaluation.

3.3.1 Bayesian inference

Becauseθ1,…,θkare samples ofθ∼U[a,b] , it is easy to verify thatμ(θ1),…,μ(θk) are also samples ofμ(θ).In view of this, our basic idea is to estimate the distribution ofμby aggregating the individual estimates ofμ(θj),j=1,…,k.

For eachθj, there are several observationsz1,j,…,zm,j.Denoteμj=μ(θj) be the value to be estimated.From the perspective of Bayesian inference and the Bayesian formula,we have

whereπ(μj) denotes the prior distribution.In an experiment, the prior distribution is usually given by assumptions or guesses or evaluated by traditional methods with uncertainties.

By the independent assumption of the observation noise,

is the product ofmGaussian density functions.Finally,

whereπ(μj) is the prior distribution, andp(zi,j|μj)is the density function of Gaussian distributionN(f(xi,μj),(f(xi,μj)σ)2).

From the previous process, we obtain posterior distributions of allμj.These are joint probability density functions of multivariate random variables and are continuous; therefore, they are suitable for the single-component Metropolis–Hastings algorithm presented in Algorithm 2.

3.3.2 Metropolis–Hastings sampling

(i) Choose a new stateβ(v+1)a(ccording to th)e Gaussian proposal distributionN,σ(v+1)2.That is,chooseβ(v+1)with probability density

It should be mentioned here that the selection of the standard deviation of the proposal distributionσ(v+1)is considerable, and significantly large or small distribution will affect the results of the algorithm.We must carefully choose in combination with the dimension of parameterμ(v+1).

(ii) Decide whether to acceptβ(v+1)or not.Let

Although the Markov chainX0,X1,… has a stationary distributionπ, not all outputsμiin Algorithm 2 are samples ofp(μj|z1,j,…,zm,j).According to the MCMC theory, for any sample function of the Markov chainX0,X1,… , only when the number of iterations is sufficiently large, the process is approximately random sampling from the distributionp(μj|z1,j,…,zm,j).

LetNconbe the number of convergence steps,Ntotbe the total number of iteration steps.DenoteN=Ntot-Ncon.For anyj=1,…,k, we collectNnumber of sampleŝμj,Ncon+1,…,̂μj,Ntotfromp(μj|z1,j,…,zm,j) and aggregate all of them to obtain a frequency distribution as estimation of the real distribution ofμ.

MCMC is known to suffer from the problem of dependence between samples because the generated samples are implemented through a Markov chain.However, our technical path is to examine these samples as a whole and then achieve an overall estimation of the parameters; thus, the impact of these correlations on the method is minimal.

3.3.3 Performance evaluation

To quantify the quality of the estimation, we introduced the Kullback–Leibler divergence [41], which has been widely used in information theory.Letp(x) andq(x) be two probability density functions overRn.The Kullback–Leibler divergence (KL divergence) between the distributionp(x) andq(x) is defined as

KL(p||q)≥0 with equality if and only ifp(x)=q(x).Kullback–Leibler divergence quantifies how close a probability distribution is to another one; that is, a larger KL divergence indicates a larger difference between the two distributions.

Because the Gaussian distribution is the most common type of distribution and is uniquely determined by its expectation and variance, we propose estimating the parameter using the Gaussian distribution with the expectation and variance being the mean and variance of the frequency distribution, respectively.We then evaluated the performance of the estimation by comparing the KL divergence of the prior distribution with the real distribution and that of the Gaussian distribution used for the estimation with the real distribution.

Supposep(x) andq(x) are the density functions ofN(μ1,σ1)andN(μ2,σ2) , respectively.

4 Numerical examples

In this section, we examine the effectiveness of the proposed method.Two models in nuclear reactor physics, namely the finite cylindrical reactor and the 2D IAEA benchmark problem, are tested below.The first test case, adapted from [42],has an analytical solution; the 2D IAEA benchmark problem is one of a classical benchmark problem in nuclear physics [42] and has only a numerical solution.To verify the effectiveness of the distribution estimation method, some parameters in these models were assumed to be uncertain in the following tests:

4.1 The finite cylindrical reactor

A finite cylindrical reactor is a multiplying system of a uniform reactor in the shape of a cylinder of physical radiusRand heightH.Its behavior is modeled using the following two-dimensional monoenergetic diffusion equation:

whereRe=R+d,He=H+d, anddis the extrapolated length.

By replacing the Laplacian with its cylindrical form, as shown in Fig.1, the analytical solution of (16) is as follows whenkeff=1:

whereJ0(r) is a zeroth-order first-order Bessel function.

Our goal is to estimate the distribution of certain uncertain parameters from the observations.Later, we compare two alternative schemes and examine the convergence of the method by applying them to single-parameter estimates, followed by multiparameter estimates.We setA=20,He=18 and use the ensemble Bayesian method proposed in the previous section to estimate the distribution of the parameterRe.In other words, we consider the following system:

Fig.1 (Color online) Geometry of the finite cylindrical reactor [43]

wherex∈[0.5,7.5]⊂R is one-dimensional spatial position,μis the input parameter we are interested in and is assumed to have a real distribution, andJ0is the zeroth-order first-order Bessel function.y∈R is an observable physical quantity, andz∈R is the observation ofywith noise∊.In this study, observations were prepared using synthetic data from simulations rather than real observations.

To prepare the observation data for our experiments, we first selectmpositions at equal intervals in [0.5,7.5] and repeatedly obtain the observations for the corresponding positions underkpairs ofμsampled from the real distribution.Supposethe real distribution ofμisN(11.0, 0.36) and the prior distribution isN(12.0, 0.25),which could be an empirical estimate.In this case, the number of sensorsm, number of samplesk, number of convergence steps, and total number of steps of the Metropolis–Hastings algorithmsNconandNtot, which are introduced in Sect.2.2, affect the results.Subsequently, we fix the value ofN=Ntot-Ncon=100 and discuss the effect ofm,k,Nconon the result.Before doing so, we compare two alternative schemes after sampling from the posterior distributionp(μj|z1,j,…,zm,j).

4.1.1 Comparison of two alternatives schemes

In this subsection, we compare these two schemes.Becauseμ1,…,μkareksamples ofμ, a straightforward idea is to restoreμ1,...,μk.After introducing the Metropolis–Hastings algorithm, we can sample from the posterior distributionp(μj|z1,j,…,zm,j) and traditional practice is to estimateμiusing samples fromp(μj|z1,j,…,zm,j).Another scheme is to aggregate all samples from the posterior distributionsp(μj|z1,j,…,zm,j),j=1,…,k, as proposed earlier, rather than using their means.

Figure 2a, b presents the results of the first and second schemes, respectively, withm=200 ,k=200 ,Ncon=100.We can see that although the frequency distributions obtained by these two schemes have almost the same expectation, their variances are quite different; that is, the second scheme will obtain a variance much closer to the real case.We believe this was caused by the loss of observation information during the averaging step.

More importantly, the estimation of the second scheme is intuitively much better than that of the first.This is because more samples are obtained in the second scheme, resulting in a more robust estimation.

4.1.2 Convergence analysis

In this subsection, we discuss the effect of the parameters of the method on the result (i.e., the number of sensorsm,number of samplesk, and number of convergence steps of the Metropolis–Hastings algorithmNcon) and analyze the convergence of the method by settingN=Ntot-Ncon=100.

Figure 3 shows the effect ofNconon the performance of the proposed method under different settings of (m,k).This shows that the Metropolis–Hastings sampling algorithm used in the ensemble Bayesian method converges rapidly.WhenNconis somewhat large, such asNcon=100 , the increase hardly affects the performance of the method.

Figure 4 shows the effect ofkon the performance of the method under different settings ofmwhenNcon=100 is fixed.This shows that whenkis small, there is a fluctuation in performance, and askincreases, the performance tendsto stabilize; that is, the expectation and variance tend to be closer to the real values, which are much closer than the prior values.

Fig.2 (Color online) Finite cylindrical reactor problem:comparison of schemes 1 and 2,in which scheme 1 aggregates k number of means whereas scheme 2 aggregates k×N number of samples.Not only is the result of scheme 2 intuitively better, but scheme 2 also obtains expectation and variance, particularly the variance,closer to the real values

Figure 5 shows the effect ofmon the performance of the method whenNcon=100 andk=200 are fixed.This shows that asmincreases, the performance of the method improves; that is, the expectation and variance converge to the neighborhoods of the real values, and this expectation trend is particularly pronounced.

Fig.3 (Color online) Finite cylindrical reactor problem:the effect of increasing Ncon on the results under three sets of parameters (m, k), i.e., (100,100), (150, 150), (200, 200),respectively.The curves in all three cases show that the effect of Ncon on the results is minimal

Fig.4 (Color online) Finite cylindrical reactor problem:when Ncon =100 is fixed, the effect of increasing number of samples k on the results under three sets of parameters m, i.e.,100, 150, 200, respectively.The curves in all three cases show that the performance of the method improves with the increase of m and quickly converges to the real situation

Fig.5 (Color online) Finite cylindrical reactor problem:when Ncon =100,k=200 is fixed, the effect of increasing number of sensors m on the results.The curve clearly shows the convergence process of algorithm performance.Compared with Ncon and k, m has the greatest influence on the method

After applying the ensemble Bayesian method by settingm=200,k=200,Ncon=100 , we obtained the frequency distribution shown in Fig.2 (b).Intuitively, this frequency distribution provides an effective estimation of the real distribution.Equation (19) describes the KL divergence calculations.From the perspective of the KL divergence,we again conclude that the distribution for the estimation derived using the ensemble Bayesian method is much closer to the real distribution than the prior one.

4.1.3 Estimation of several parameters

In this section, we test the effectiveness of the method for estimating several parameters simultaneously because several parameters are simultaneously uncertain in many cases.Now, we use the ensemble Bayesian method proposed in this study to simultaneously estimate the distributions of three parametersA,ReandHe.Consider the following system:

wherex∈[0.5,7.5]⊂R is a one-dimensional spatial position;μ(1),μ(2),μ(3)are parameters that we are interested in and assumed to have a real joint distribution;J0is the zeroth-order first-order Bessel function.y∈R is an observable physical quantity, andz∈R is the observation ofywith noise

We experimented with the simulation datasets.To prepare the observational data for our experiments, we first selected 200 positions at equal intervals in [0.5,7.5] and repeatedly obtained observations for the corresponding positions under 200 pairs of (μ(1),μ(2),μ(3)) sampled from the real joint distribution.

Suppose the real joint distribution of (μ(1),μ(2),μ(3)) be three-dimensional Gaussian distribution with expectation E(μ(1),μ(2),μ(3)) and covariance matrix Cov(μ(1),μ(2),μ(3))as follows:

It should be noted that the correlations between them are assumed here.

Prior distributionπ(μ1,μ2,μ3) is also a three-dimensional Gaussian distribution with expectation Ep(μ1,μ2,μ3) and covariance matrix Covp(μ1,μ2,μ3) as follows:

After applying the ensemble Bayesian method by settingm=200 ,k=200 ,Ncon=100 , we obtained three frequency distributions, as shown in Fig.6.We find that the expectations and variances of all these frequency distributions are significantly close to the expectations and variances of the real marginal distributions.Intuitively, the frequency distributions provide an effective estimation; therefore, the ensemble Bayesian method is also suitable for the simultaneous estimation of several parameters.From the perspective of the KL divergence, the estimation also shows a significant improvement compared to the prior distribution.Equation(21) provides the calculations for the KL divergences.

Fig.6 (Color online) Finite cylindrical reactor problem: result of three marginal distributions when setting the number of sensors m=200 , number of samples k=200 , and number of convergence steps of Metropolis–Hastings algorithm Ncon =100

4.2 the 2D IAEA benchmark problem

[42] gives the definition of the 2D IAEA benchmark problem and [44] gives its implementation with neutronic code.Figure 7 illustrates the geometry of the reactor.Note that only one-quarter is given because the rest can be inferred by symmetry along thexandyaxes.We denote this quarter by Ω , which is composed of four subregions with different physical properties: Neumann boundary conditions for the left and bottom boundaries and the mixed boundary condition for the external border.

Fig.7 (Color online) Geometry of 2D IBP, upper octant: region assignments, lower octant: fuel assembly identification [44]

The 2D IAEA benchmark problem as modeled by twodimensional two-group diffusion equations.Specifically, the fluxφ=(φ1,φ2)T(indexes 1 and 2 denote the high and thermal energy, respectively) satisfies the following eigenvalue problem [5]: Find (λ,φ)∈C×L∞(Ω)×L∞(Ω) , s.t.

where Σ1→2is the macroscopic scattering cross section from groups 1 to 2;Di, Σa,i, Σf,i, andχiare the diffusion coefficient, macroscopic absorption cross section, macroscopic fission cross section, and fission spectrum of groupi,i∈{1,2} , respectively; andνis the average number of neutrons emitted per fission.In other words, these quantities are the model parameters.The general settings are listed in Table 1.

Later, we assumedD2in Ω4was uncertain and used the proposed ensemble Bayesian method to estimate its distribution, as it is probably one of the most important parameters.We assume that all other parameters are certain and set them according to Table 1 except forD2in Ω4.We denote this asμ, which is what we are interested in.

We experimented with simulation datasets.To prepare the observation data for our experiments, we first selected 45 positions (i.e., (10,10),(30,10),(30,30),… ,(170,10),…,(170,170) ) in area Ω and repeatedly obtained the observations for the corresponding positions under 200 pairs ofμsampled from the real distribution.

Moreover, the two-dimensional two-group diffusion Eq.(22) must be solved in the algorithm process.This was achieved by employing the generic high-quality finite element solver FreeFem++ [45].

Suppose that the real distribution ofμis a Gaussian distribution with expectation E(μ) and variance Var(μ) as follows:

where E(μ) is the value listed in Table 1.

The prior distributionπ(μ) is also a Gaussian distribution with expectation Ep(μ) and variance Varp(μ) as follows:

Table 1 General settings of the parameters of the IAEA 2D benchmark problem

After applying the ensemble Bayesian method with settingNcon=100 , we obtained the frequency distribution shown in Fig.8.We find that the expectation of the frequency distribution is significantly close to the real expectation, and the variance is also closer to the real variance than the prior distribution.Furthermore, the frequency distribution intuitively provided an effective estimation.From the perspective of the KL divergence, the estimation also shows a significant improvement compared to the prior estimation.Equation (25) describes the KL divergence calculations.In this case, we used the engineering software FreeFem++ to implement our algorithm because this model only has numerical solutions; therefore, the proposed ensemble Bayesian method has potential for engineering applications.

For the finite cylindrical reactor, the computation time of one-dimensional case (Fig.2b) is 111.7 s; the computation time of three-dimensional case (Fig.6) is 359.6s.For the 2D IAEA benchmark (Fig.8), the computation time is about 130000 s.The main reason why 2D IAEA benchmark takes more time is that FreeFem++ takes up most of the runtime,and the most important problem is that FreeFem++ is not called or connected to the code efficiently enough.Therefore, if there is a way to call FreeFem++ efficiently, it can significantly save computing time.

Fig.8 (Color online) 2D IAEA benchmark problem: Result of the method when setting the number of samples k=200 and the number of convergence steps of Metropolis–Hastings algorithm Ncon =100

5 Conclusion

In this study, we considered model parameters from the perspective of random variables in the context of nuclear reactor engineering and proposed a general form of parameter distribution inference problem.In the context of this parameter distribution estimation problem, we conducted a preliminary exploration and proposed an ensemble Bayesian method to estimate the parameters by obtaining frequency distributions, combined with a special type of experimental setup.Simultaneously, we introduced the KL divergence theory to quantify the estimation performance of the method.Various numerical experiments were conducted, including a finite cylindrical reactor model,which has an analytical solution, and the 2D IAEA benchmark problem, which only has a numerical solution.From the results of these tests, the following conclusions were drawn:

• For those two models, the frequency distributions intuitively provide effective estimation for the parameters.The expectations are significantly close to the real ones,and the variances are also more accurate than the prior distributions.From the perspective of KL divergence,the method also has good performance;

• The ensemble Bayesian method can estimate several parameters simultaneously, even if there are correlations between them;

• When the model has only numerical solution, we use engineering software FreeFem++ to implement our algorithm.The method also works well in this case,which means it has potential for engineering applications;

• The convergence speed of the method is fast.Thus, in general, the ensemble Bayesian method we propose has optimistic application prospects.

The proposed method shows high potential for engineering applications to correct prior distributions when using the data assimilation technique for parameter estimation.Further studies could investigate the performance of the proposed method under moderate-scale parameters,the performance for non-Gaussian parameters, and characteristic parameters that are likely to be difficult to reproduce.As an important review of machine learning (ML) in nuclear physics, [46] not only describes the different methodologies used in ML algorithms and techniques and some applications, particularly for low- and intermediate-energy nuclear physics, but also provides a valuable summary and outlook on the possible application directions and improvements of ML algorithms in low- and intermediateenergy nuclear physics.Inspired by the review, we will improve the efficiency of the method, and our initial idea is to improve the efficiency of the Metropolis–Hastings sampling method used in this study.Furthermore, we will determine how to involve more physics-informed or physics-guided to develop new machine learning algorithms.

Author Contributions All authors contributed to the study conception and design.Material preparation and data collection and analysis were performed by Jia-Qin Zeng, He-Lin Gong, and Hai-Xiang Zhang.The first draft of the manuscript was written by Jia-Qin Zeng, and all authors commented on previous versions of the manuscript.All authors read and approved the final manuscript.

Data availability The data that support the findings of this study are openly available in Science Data Bank at https:// www.doi.org/ 10.57760/ scien cedb.j00186.00333 and https:// cstr.cn/ 31253.11.scien cedb.j00186.00333.

Declarations

Conflict of interest The authors declare that they have no competing interests.