Problem of Reliability Justification of Computation Error Estimates

The problem of reliability justification of errors estimates of results obtained by numerical methods is under consideration. The application possibility conditions of the known methods and their downsides are discussed. A new method is offered including numerical filtration of results, choice of a standard and correctness verification of its choice. This method allows the obtainment and visualization of certain error estimates and justification of result reliability. The numerical solutions obtained by different methods of the Stokes solitary wave problem are shown. Also, the given problem results computed by different authors with help of the known methods are cited. Computations possibility of results with high accuracy and preference of the suggested approach for estimate reliability justification are demonstrated by the sample of this problem.


Introduction 1.
We belong to the rather large group of investigators engaged in computational experiments in different fields of science, and as with many of them we have found the problem of reliability justification of results obtained with application of numerical methods very important.
An investigation of scientific problems presents rather difficult processes as far as they apply to the modern technologies of mathematical modelling, numerical experiment and program complex development for this type of experiment.The solving of a multi-parameters problem is accompanied by the appearance of different error types on all these stages.This includes an error of input data, round-off errors, an error of numerical method and an additional unobservable error.That's why the result reliability must be justified.The sources of these errors types are restriction of time, memory capacity, mantissa digits number and reliability.
The analysis of a great number of works on this subject shows that insufficient attention has been paid to the questions of reliability justification of results obtained by mathematical modelling.The verification of numerical data presents great complexity because a considerable part of each investigation (for example a program) is usually not shown.So, the quality of numerical errors analysis depends on researcher experience and intuition.
The comparison of numerical data with physical experiments results often applying to error estimation allows estimation of the approximation error only.But it contains a mathematical model error, an experiment error and an error of computation: . Without the computation error estimate this sum doesn't allow estimation of the error of the mathematical model.In this case the sum module can be less than every summand if these summands have different signs.The  Assuming the model error dependencies have smooth characters, the number of checked physical experiments can be essentially decreased to verify the mathematical model in some range.
Simplified methods of reliability justification of error estimation predominating in computation practice don't possess the required reliability.
Evidently, the earliest method provides a difference of values from distinct experiments as the error estimate: .
(1) The values z1 and z2 can be obtained by different numerical methods or by one method with various values of a digitization parameter n (for example, number of grid knots).
If z1 and z2 are approximate values, i.e. , , where z is an accurate value, , are unknown errors, then for fulfilling of the inequality: , the estimates and are valid if and only if the errors and have different signs.In this case, if any sign combination is considered to be equiprobable, then half of the estimates are incorrect.If we take into account the results obtained by one method with variation of the digitization parameter n, then the errors most likely have one sign, but the error of value obtained for greater than n can be considerably less than the other error.Then the estimate (1) becomes rather acceptable.Along with the simplicity of this method, it can explain the popularity of such estimations.
So, the estimate (1) can be used only with fulfilment of the defined conditions, which can be violated because of presence of different kinds of error.Therefore, a procedure of the conditions fulfilment verification and estimate justification must exist.But now it is not observed at all.Without such a procedure, errors of estimates of ten to100 times are possible, if the formula (1) is used, which is shown below.
At the beginning of the 20th century the Runge rule of error estimation and the Richardson rule of more precise result definition were worked out on the basis of representation of dependence of an approximate result on n as [1]: where z is an accurate value; zn is an approximate result obtained for the mesh points number (or number of sum addends) equal to n; c1 is a coefficient independent of n; k is an accuracy order of method.
Under the supposition that the small quantity may be neglected, we have one equation with two unknowns z and c1 for known k (2).For another n ( ) we obtain a similar equation.Solving the system of the two equations the approximate equalities are obtained: The first equality gives the required value (extrapolated by the Richardson rule).The second one provides an error estimate of the approximate value zn by the Runge rule.
The mathematical model of an error in more common cases is represented as follows: . (4) For the finite difference formulas of numerical differentiation and for the Newton-Cotes quadrature formulas of numerical integration the values kj are integer positive numbers according to the expansion by the Taylor and Euler-Maclaurin formulas [Bakhvalov, Zhidkov & Kobel'kov, 1987].
Let us consider a sequence of the values and a sequence of numerical results , obtained for corresponding different mesh including nodes.Q = 2 in most cases.Let us neglect the small quantities and substitute the pairs in (4), and then we have system of L + 1 linear algebraic equations.This system is the same as the system in the problem of interpolation of some function f(x), ( ), by algebraic polynomial given with a table of the values .We define the required value z by extrapolation of to the value x = 0.If the Aitken recurrent relation [Bakhvalov, Zhidkov & Kobel'kov, 1987] is used to obtain the interpolational polynomial, the problem is reduced to sequential application of the Richardson extrapolation formula (3) to the all pairs of neighbouring values (for i differing by unit) for .Then the pairs of the extrapolated values are extrapolated once more for and so on.The Romberg method [Romberg, 1955] consists of the steps mentioned above.
The algorithms , , and others [Aitken, 1926] are applied, if the parameters are unknown in (2), (4).Many other algorithms of the convergence acceleration are worked out for extrapolation and more precise definitions of computation results, in particular for integration of systems of stiff and singular ordinary differential equations and for solving of the Euler and the Navier-Stokes equations and so on.
Note that the possibility of neglecting the last addendum in (2) and ( 4) is the condition of applicability of these algorithms.It is justified in scientific literature by proof that the last addendum is an infinitesimal quantity or .But n is always bounded under the conditions of the computer resources restriction.For a finite n an infinitesimal quantity may not have a really small value in comparison with .The presence of round-off errors restricts this magnitude below and leads to its growth with increase of n in most cases.Also the possibility of error existence in programs isn't to be ignored.
Thus, a strict proof of convergence can't be used as justification of the practical error estimate of numerical results.Moreover, the estimation methods of partial errors are known (for example, conservation law violation in nonstationary problem solutions).There is a risk of making a mistake in the whole error if the partial estimates are applied.
A lot of published numerical results contain errors in digits declared as valid.
Let us consider the problem of the Stokes solitary wave as an example.This problem has been solved by many investigators [Longuet-Higgins & Fenton, 1974] during recent years.This problem statement is considered in detail at the end of the paper.
The numerical results of the Stokes solitary wave parameters obtained by the different authors are presented in Tables 1 and 2. The errors in units of the last digit (if the error is greater than unit) are given in brackets.The authors of the cited papers declare error estimates on the level of unit of the last adduced digits as a rule.The comparison of different authors' results shows that the authors often make mistakes of error estimates five to ten times (at least every second author).The values of the other parameters of the Stokes solitary wave presented in Table 2 (where a is amplitude, mmass, i -impulse, c -circulation, t -kinetic energy, u -potential energy; see ( 20)-( 25)) show that all cited authors made mistakes, sometimes in the last two digits.The results of the final column are compared with the results in [Sherykhalina & Zhitnikov, 2001] (Table 3).

Longuet-Higgins & Fenton [Longuet-Higgins & Fenton, 1974 ]
Fox [Fox, 1977] Williams [Williams, 1981] (computed) Williams [Williams, 1981] (extrapolated) Evans & Ford [Evans & Ford, 1996 Note that Williams [Williams, 1981] applied the linear extrapolation of parameters dependence on the residual of a boundary condition for more precise definition of computational results.Table 2 shows that such extrapolation doesn't lead to refinement of the results.
We need to note that the authors of the cited papers applied the generally accepted approach to error estimation and its justification (truly, its absence) and that is why there are no pretensions.And we also do not want to assert that all or most parts of different results published in scientific literature have 1-2 incorrect digits.There are the samples of problem solutions in which a real accuracy corresponds to a declared one.But, as a rule, it is not possible to define where this correspondence takes place and where not, if you have published data only.
It should be noted that the widely accepted mathematical software packages have the same weaknesses.From one side, software developers cannot guarantee reliability of results obtained by a user.From the other side the accompanying documentation has no instructions for justification of obtained results.A software product user has to think of a way of how to do this.
That's why it is possible now to estimate an error in published results correctly only if more precise results appear (or it is known that the errors of two results have different signs).But as Table 1 shows, that chronology of more precise result obtainment can be violated.So, the problem of obtained estimates justification remains relevant for this method of verification too.
With this connection, in this paper a method is proposed allowing at first the determination of a range in which we can confirm the estimates obtained by the Runge rule, the Romberg method and so on.Secondly, estimating with the maximum precision is required: this can be achieved in analysis of results of a concrete numerical experiment and can provide physical reliability of obtained estimates.The physical reliability is guaranteed by determination of an approximate value of a required parameter (called 'standard' below); definition and independent justification of its error estimate (an indeterminacy interval); and checking of intersection fact of intervals obtained by different methods.

Mathematical Model of Computing Process 2.
The mathematical model of computing process by many numerical methods of some value z can be presented as the function: , (5) where z is a desired value; zn is its approximate result obtained for the mesh points number equal to n; and f1,…, fL are some functions of the mesh points number.All the constants and functions can have both real and complex values.
For the finite difference formulas of numerical differentiation and the Newton-Cotes quadrature formulas , where kj are real numbers as mentioned above.For the finite difference methods of solving of mathematical physics equations the existence of multiple error components depending on n is confirmed by the computing experiment in [Zhitnikov & Sherykhalina, 2012].Some numerical methods correspond to the function , where , in particular considered in [Zhitnikov & Sherykhalina, 2013].
The quantity contains the components that are not included in the sum, the remainder, round-off error and many other components due to both a numerical method and a specific program implementation.Therefore does not tend to zero with increasing n but becomes greater in most cases.
Let's consider a finite sequence of numerical results obtained for grids with ni nodes.Then we can write the equalities system as such: (6)

If
is the unknown required parameter along with z, c1, … cL, then the number of the unknowns in the system (6) is always greater than the equations number, and this system of equations has an infinite number of solutions, including an exact one.
We can get an approximate value of the exact solution z and the error estimate applying the regularization methods [Morozov, 1987].However, the known methods of regularization require some additional a priori information about the unknowns , and perhaps about the required parameters z, cj.Estimates obtained by such methods are dependent on this additional information, so they can hardly be regarded as error estimates.
In order to avoid incorrectness we propose dividing the problem into two parts: the problem of the mathematical model identification based on the results of the numerical experiment and the testing problem by comparison with known partial solutions or other methods.
The first problem is not to determine the theoretical values z, cj, but to decompose into components of the form (5). The mathematical notation of this problem has the same form ( 6), but and all other members of ( 5) have an entirely different meaning, because it is known that does not include components and the constant.This problem can be solved approximately by filtration, i.e. by elimination of error components.It makes possible the obtainment of the standard and the error estimate .In this case the resulting uncertainty interval can contain or not contain the exact value of z, for example, due to an error in the program.
The second problem is testing, i.e. comparing with a known particular exact solution (checking falling into the resulting interval) or with the approximate solution obtained independently by other numerical methods (checking intersection of the uncertainty intervals).This method of additional information application does not affect the estimates obtained previously by independent methods, but only confirms or disproves them.
A theoretical estimation of reliability (confidence probability) of the joint solution of these two problems is presented in [Zhitnikov & Sherykhalina, 1999].

Numerical filtration 3.
Numerical filtration [Zhitnikov & Sherykhalina, 2012;Zhitnikov & Sherykhalina, 2013] is the sequential removal (elimination) of error components, and the determination of the filtered sequences , j = 1, …, L (j is a serial number of filtering).For the equation ( 6) the filtration is reduced to the linear combination , where and are determined by solving of two equations systems: , m = j + 1, …, L.
The filtration result is the new sequence without the components fj(n).This sequence can be subjected to repeated filtration because of preservation of the form of the components that corresponds to fulfilment of the condition .If (for real or complex ), then , , and each of the exponential components is the geometrical progression.If , then , , and (7) coincides with the Richardson's extrapolation formula.In this case the algorithm of multiple filtrations represents the Romberg method [Romberg, 1955].
Results of multiple filtrations can be presented as a matrix of computed and filtered values: (8) Numerical values given in this matrix must be analysed for error estimation and reliability justification of these estimates.The data processing results obtained under computation of the integral , are represented at logarithmic scale in Fig. 1.The bold curves are results of comparison with an accurate solution.Decimal logarithms of relative errors are put on the ordinate axis, i.e. an accuracy expressed in number of accurate decimal digits.The logarithmic scale on the abscise axis is chosen for power components of error dependency ; for exponential components the scale is linear.For this choice of the scale every component of the dependence ( 5) is represented as a straight line.
The lines denoted as 0 in Fig. 1 correspond to an accuracy of computed values of the integral, as 1, 2… and so on are an accuracy of results of the first, second and so on filtrations.
The tangent of the line angle inclination (which is equal to k1, k2, …) varies as a result of every filtration, and an upshift takes place.This demonstrates the accuracy increase.Such lines behaviour confirms the presence and elimination result of concrete components of the dependence.
Comparison and analysis of several lines location allows the determination of both error estimates and fuzzifying of these estimates.The fuzzifying is an error estimate of the error estimate.The ordinates difference of lines for definite n is equal to decimal logarithm of relative fuzzifying of the estimate corresponding to the lower line.So the scale unit corresponds to relative fuzzifying equal to 0.1, two unite to 0.01, etc.
Every line in Fig. 1 has two parts.The first one has inclination corresponding to an exponent of the function.The line behaviour in the second part in contrast to the first one has a chaotic character, which is explained by the prevalence of the error component (n), the module of which can be approximated by the straight line = 16.5 ½ lg n.
Let us consider the results of the Runge rule application (Fig. 1b).It is easy to note that the Runge estimate is close to a real error in the range separated by a half of the scale unit from a level of the irregular error = 16.5 ½ lg n.The Runge rule is reduced to comparison of an approximate value with the right neighbouring one in (8).Note, that the Richardson extrapolation application is competitive insofar as it is possible only under prevalence of a component in the form of a power function with a definite exponent.In this case the filtration allows a decrease of the error sensitivity (by the factor of ten or more times) and the comparison with the filtered value provides the same result as with the accurate one.The fuzzifying in this case is very small (less than 0.1) and the estimates are rather precise.
Displacement upwards of second parts of the lines in Fig. 2b (with increase of filtrations number) corresponds to the condition of irregular error prevalence.Difference of the values couples can be used as the error estimate in the range of prevalence of irregular error with changing sign if it is used several times.In the Richardson extrapolation formula this difference is divided into the rather high value .That is why the difference between approximate and extrapolated values is a small quantity and the Runge estimate gives results with overstated accuracy.caused by the dependence of the estimate on concrete change regularity of error.In the case of standard error existence, the estimate restriction on this error level takes place (Fig. 1b, the thin curves), so the estimates remain correct.
Detection of a locality with approximately constant error allows correction of the standard and increases the result accuracy.
However, the standard choice is inconvenient because of the need for expert intervention in the estimating process online.

The Standard Selection Rule 5.
To formalize the procedure of the standard selection a two-step method for error estimation is proposed.
The sequence has the form of ( 6) each time after repeating filtration.Each member of the sequence contains at least two unknowns: the required z and the error.
In order to avoid uncertainty, division of the stages of the error estimation and definition of the required z is suggested.For this the first stage is the filtration by the formula: , (9) which excludes the unknown required z from the system (6).Thus, the further filtration by ( 7) is an error estimation, independent on the standard z choice (Fig. 2a).The estimate obtained by this method allows choice of the best value, from the point of view of an error minimum (or a combination of similar error values), ratio of and j = j0.Thus, the problem is reduced to the minimization problem for some constant k = 0, 1, 2, ... to find minimum for i and j with the constraints: , .
Values and j = j0 which are obtained by solving the minimization problem are used to determine the values (the standard) by filtration of the sequence by the formula (7).Superposition of the diagrams for pairwise subtraction (9) and for comparison with the standard allows verification of correctness of the standard choice.Fig. 2 shows that the results coincide with estimated precision.
Note that the transformation (9) changes the components of ( 5) as such: but for it is a negligible change (Fig. 2, ).In the case of considerable distinctions the variation of error components (11) can be taken into account by means of modification of the transformations ( 9), ( 7): , , j = 1, 2, ….
Thus the formal procedures of standard determination and verification of its choice correction are defined.Together with the estimate (10), the interval of uncertainty is determined.To define the other components of the dependence (1) the identification method offered in [Zhitnikov, Sherykhalina & Porechny, 2010], based on filtration, eliminates the constant from (1) and modifies (1) so that the coefficients c1, c2, etc. appear in the first place in the expansion.Then the sequence determination of these constants is carried out by the twostage filtration method considered above.

Formulation and Solution of the Problem of the Stokes Solitary Wave 6.
Let us consider the problem of the Stokes solitary wave, mentioned above, as an example.
The stream of ideal fluid flows along the horizontal rectilinear wall ADC (Fig. 3 ).The acceleration of gravity points vertically downwards.The solution for a solitary wave of the maximum height is sought.The wave possesses 2 /3 cusp at its crest (the Stokes wave).Asymptotic width of the flow is h, and velocity at infinity is V .A pressure P on the free surface is equal to the atmospheric pressure P0.
We chose the coordinate system so that the -axis coincides with the straight line ADC, and Y-axis points vertically upwards and passes through the cusp at the wave crest (the point ).We consider the complex variable plane Z = X + iY.
The free surface shape is unknown.But on the free surface 'BC' Bernoulli's equation relates the velocity vector module V to the height of point Y for P = P0: The domain corresponding to the stream on the complex potential plane W is a band (Fig. 3b) as far as the boundaries are stream lines.
We choose the band = + i of the width 1 as the parametric variable range (Fig. 4b).In this case the free surface is mapped onto the real axis = + i0.
So, the dependence W( ) is determined by the formula: , ( 13) where is the velocity potential, is a stream function, and is fluid expenditure in the stream.It is convenient to use the plane of logarithmic hodograph of velocity (the Levi-Chivita function) for problem solving: . ( 14) Here is complex conjugate quantity to dimensionless velocity, and V and are module and angle of inclination of velocity vector to X -axis, .
The function ( ) must satisfy the following conditions: 1) the real part for Re = 0, ; 2) the real part for ; 3) the equation ( 12) connects the real and imaginary parts ( ) at Im = 0; According to the symmetry we can consider the right half of the stream.We represent as a sum: , where 2( ) is the function satisfying the conditions 1, 2, 4, 5. for is a result of such representation.So, the function is continuous on the boundary.The function 1( ) is defined as follows.The solution is sought on the boundary = at nodes m (m = 0, …, n).
The values are required.We assume for = n, because decreases rapidly (as exponent), if .Moreover , as far as (according to the problem conditions).The values at points between the nodes are defined with the help of a cubic spline which has two continuous derivatives.The Schwartz formula [Lavrentiev & Shabat, 1973] is applied for restoration of the function 1( ); with regard to the function, is an odd function of and : . For according to the Sokhotskii formula [Lavrentiev & Shabat, 1973]: . The calculation of the principal value of the integral for 0 m < n is practically made by the formula: .
(15) The two points Gauss quadrature formula is applied for numerical integration between the two knots m-1 and m.The summand is introduced for solution of the singularity isolation at the point (Fig. 4a).Let's present the equation (12) in differential form:  .( 17) This function satisfies the problem conditions on AD and DB and it has necessary singularity for 0 as is shown below.
The following estimates take place for , and : , .
Note, the values , as far as is an odd function and is even function on .The obtained estimates are substituted in ( 16) and then: .Setting equal the values of the same order O(1) and O( ) we have the expressions: , . ( 19) The value is defined from the solution of the equation ( 19) that was obtained earlier in [Longuet-Higgins & Fox, 1978].
With regard to ( 13), ( 14) the free surface shape is determined by numerical integration of the expression: . The formula of mean rectangles is applied for the numerical integration with the following filtration for achievement of the accuracy up to the order 10 16 .
The problem is solved numerically by the collocation method.The equation ( 12) is fulfilled at the discrete points of the axis BA' ( , ).Moreover, the equation ( 18) must be fulfilled.The system of n + 1 nonlinear equations is solved numerically with respect to the parameters Fr, C1, ( ) by the Newton method with step regulation.The sum of residuals squares in the all equations is minimized.The solution search is stopped if the residuals in modulus are less than 10 15 .
The nodes are defined with varying steps according to the formula:
(26) We have published the results of this problem in [Sherykhalina & Zhitnikov, 2001] solved by the Levi-Chivita method taking into account the singularities of the solution at the cusp point and at the infinity analogously [Terentiev & Zhitnikov, 2006].These results are presented in Table 3 (the columns 2, 3).The filtration formula, similar to the Aitken one, applying the least squares method [Sherykhalina & Zhitnikov, 2001;Zhitnikov & Sherykhalina, 2012], is used for more precise definition and error estimation.The obtained estimates allow declaration of accuracy up to the 13 th digit for the Froude number and up to two to three units of the 13 th digit for the other parameters (Table 3, column 3).Later on [Sherykhalina & Zhitnikov, 2001], the -algorithm is applied to the data obtained by the same method.The dependencies diagrams covering takes place for filtration results analysis for different sequences beginning from 5, 6, 7, 8 and further multiple doubling of these initial values.This allows definition of a more precise value of the Froude number up to 14 digits (the last line of Table 1) and the mass value up to the unit of the 13 th digit.This confirms the estimates of these parameters in [Sherykhalina & Zhitnikov, 2001;Zhitnikov & Sherykhalina, 2012].But it is necessary to use another method for independent verification of these estimates.The method considered in this part is developed for this purpose.
The results calculated by the given method and filtered by the rules described above are presented in Table 3 (the columns 4, 5).The diagrams covering the dependencies obtained for different sequences beginning from 5, 6, 7, 8, 9 and further multiple doubling of these initial values (up to n = 1280) is used for more detailed investigation of the dependencies.The filtration of the every sequence is carried out independently, but the values with more closed are used for pairwise subtraction (9).Such improvement of the filtration method allows detection of a small deviation of lines behaviour corresponding to the pairwise subtraction and comparison with the standard, which requires additional analysis.The upper lines (number 3) are used for an approximate estimate of the level of irregular error and error estimate fuzzifying but not for error estimates of obtained data.
According to these estimates, the results given in Table 3 (column 5) have the confirmed accuracy of up to two to three units of the 15 th digit for the Froude number and the amplitude and up to two to three units of the 14 th digit for the other parameters.Thus, three problems are useful for error estimation from the practical point of view.The first one is the definition of the range of digitization parameter n and the error , where the error estimation by more simple rules (as Runge and Romberg estimations) gives acceptable results.The second problem is to provide the highest possible accuracy of the standard for a given experiment and its error estimate.The third problem is investigation of error components for search mistakes in the program and improvement of numerical algorithms.All these problems are solved by the filtration methods.
A procedure including two stages has been suggested: the identification of a mathematical model, by means of multicomponent analysis of numerical experiment results, and testing.
The filtration realized at the stage of postprocessor treatment of numerical experiment data allows the obtainment of error estimates and essentially raises the efficiency of numerical algorithms.Testing with help of accurate or approximate (obtained by the other method) particular solutions makes possible the confirmation or disproval of these estimates.So the reliability of numerical data and conclusions based on the estimates increases.
The investigation of error dependencies on digitization number n (for example, collocation points number) allows formulation of the conclusion of three areas.At first one can see the area of chaos caused by rough digitization; then the area of prevalence of regular error components, which can be filtered.And then there is the area of the round-off error predominance.Boundaries of the areas are determined experimentally.
The solving of the Stokes solitary wave problem by the new method allows the confirmation of the estimates obtained earlier in [Sherykhalina & Zhitnikov, 2001] with help of the filtration method.

Fig. 1 .
Fig. 1.Computation of the integral: (a) comparing with an accurate value and with the standard; (b) the Runge error estimation.The dotted line = 16.5 ½ lg n.Therefore, a comparison of the calculated and the filtered values with a number, called the 'standard', is proposed.The error in the standard choice does not perform the marked disadvantage of the Runge rule ('seeming preciseness'),

Fig. 2 .
Fig. 2. The results of the two-stage error estimate with exclusion of the required value by the formula (9): (a) differences estimate; (b) comparison with the standard (thin lines) and covering the diagram (a) (bold lines).

Fig. 3 .
Fig. 3. Domain shape corresponding to the flow on the planes: (a) physical; (b) complex potential.

Fig. 4 .
Fig. 4. Domain shape corresponding to the flow on the planes: (a) logarithmic hodograph of velocity; (b) parametrical variable .
. For > 0 the ratio of nodes density at the end and beginning of the grid is .
situation can adversely vary if the problem parameters change.The possibility to estimate the model error appears when independent

Table 1 .
The values of Froude number (Fr) published by different authors. ]

Table 3 .
The values of the solitary wave parameters obtained by different methods.