https://marshallward.org/mom6vv

Verification and Validation of MOM6

Alistair Adcroft
Robert Hallberg
Marshall Ward

2020 June 8

So I think everyone here would agree that the MOM6 community is growing, and will probably continue to grow...

which means the code will start to evolve in new and unexpected ways - which is a good thing...

But that also probably means more errors and mistakes, and we of course need to be ready for them.

So I'm going to go over some of the elements of the MOM6 testing framework...

Partly to give you all confidence in the model, but also to help prepare anyone who plans to contribute.

Now no one can even know if a code is "right", but there are ways to at least ways to show if it's wrong, anr assure ourselves that it's at least basically doing what we expect.

So when software engineering talk about this stuff, they introduce the concepts of *verification* and *validation* when describing the testing of software.

Which I think is a good perspective for understading how we do the testing of MOM6.

Verification

Am I building the product right?

Validation

Am I building the right product?

Barry Boehm, Software Risk Management (1989)

Verification

What are the design specifications of my model?

Does it compile on target platforms?
Are the equations dimensionally consistent?
Does parallelization change the answers?

Verification is the confirmation of design specifications.

Validation

Does our model meet operational needs?

Does it produce realistic simulations?
Are relevant physical features present?
Can I reproduce my old simulations?

Validation is an assessment of the final product.

These are the broader goals which cannot be confirmed until the project is done.

We don't design something like a AMOC into the model. We get the microphysics, the momentum and mass transport, and so on to the best of our ability, and then we essentially assert from physical principles that it all works out.

In a way, it separates the mechanics from the theoretical analysis.

A strong verification system can let us declare with great confidence that our code is free of dumb bugs, that our stencils are correct, and that we are getting the microphysics right.
A strong validation system is the assessment process, where we produce output in a careful, controlled, and reproducible manner, and can confidently declare that the model output is what we say it is.

Since it is often based on client evaluation, it's often considered an external assessment.

Looking past semantics, I think this are clear and independent concepts, and they map well onto the climate model development process.

V&V in Development

So a lot of these models are kind of historical, and arguably were just invented by managers to define progress. But they all do acknowledge some form of verification and validation step.

The waterfall method (upper left) has a single stage of development, like building a bridge or launching a space rocket, so validation is not possible. Verification must be extremely thorough.

More iterative models like the V-model treat the development process as verification, and then use validation to re-define the verification steps.

Modern Agile methods define multiple development cycles, including verification and validation after each "scrum". Scrum-like methods can blur the distinction between verification and validation.

MOM contributions don't typically map onto any of these models (maybe agile if you're feeling generous), but we do embrace V&V in our development cycle.

MOM6 Development

V&V in MOM6

Fork from a community repository
Implement feature
Submit as Pull Request™ (PR)
Trigger V&V events
- Automated verification
- Manual validation

All contributions must pass verification and validation before merge.

MOM6 Verification

All changes sent to CI (Travis) for verification

Verification Tests

Test	Description
grid	Symmetric/Asymmetric memory grids
layout	1×1 and 2×1 domain decomposition
rotation	Index rotation
restart	Restart at mid-run
repro	Optimized reproducible mode
openmp	OpenMP (single-thread)
nan	NaN array initialization
dim	Dimensional scaling

Each test requires bit reproducibility

Tests in Action

Regressions

What if we want answers to change?

MOM6 Validation

Current validation includes over 60 tests

Hub Validation

Solution verification

ocean.stats:

Step,       Day,  Truncs,      Energy/Mass,      Maximum CFL,  Mean Sea Level,  Total Mass,  Mean Salin, Mean Temp, Frac Mass Err,   Salin Err,    Temp Err
          [days]                 [m2 s-2]           [Nondim]       [m]             [kg]         [PSU]      [degC]       [Nondim]        [PSU]        [degC]
   0,       0.000,     0, En 7.2161166068132286E-27, CFL  0.00000, SL  1.8190E-12, M 1.39637E+20, S 35.0000, T 13.3483, Me  0.00E+00, Se  0.00E+00, Te  0.00E+00
  12,       0.500,     0, En 2.7781004671136538E-04, CFL  0.00011, SL  1.1369E-12, M 1.39637E+20, S 35.0000, T 13.3484, Me -6.09E-17, Se -3.90E-15, Te -1.17E-15
  24,       1.000,     0, En 2.7734897826598717E-04, CFL  0.00014, SL  1.8190E-12, M 1.39637E+20, S 35.0000, T 13.3486, Me  2.89E-17, Se  8.80E-17, Te -2.88E-16

Based on global metrics (energy, mass, etc)

Diagnostic verification

chksum_diag:

u-point: mean=   1.1239682303793666E-04 min=  -6.7187595818683776E-03 max=   3.3480219779204019E-02 ocean_model-u
u-point: c=     21851 ocean_model-u
v-point: mean=   1.2076392816784489E-03 min=  -8.3469699425156359E-03 max=   6.8420831486068704E-03 ocean_model-v
v-point: c=     18606 ocean_model-v
h-point: mean=   3.6490088139048595E+02 min=   9.9999999999999915E-04 max=   5.6265092225099863E+02 ocean_model-h
h-point: c=     18673 ocean_model-h

Min, max, mean, bitcount for every diagnostic

Bit Reproducibility

Verification requires bit reproducibility

Identical code and input, different math libraries (c/o Foone)

This is two instances of the opening demo of the 1991 Super Nintendo game, "Pilotwings".

In the initial release, the plane would make a graceful landing. But in a later version, the plan would crash spectacularly.

People later inspected the data and discovered that the instruction code was identical. So what happenend?

The cartridge included a mathematical coprocessor (NEC DSP-1) for its 3D calculations. The second iteration of the coprocessor (DSP-1b) reordered some of its arithmetic to improve the accuracy of various calculations.

Unfortunately this also resulted in tiny course corrections in the flight, which eventually led to the demise of the biplane in the video.

The moral of the story is that when working with limited precision, one must be very careful!

(Confession: this was fixed-precision arithmetic, but it's basically the same problem.)

Floating Point Review

\[\phi \equiv (-1)^{\color{yellow}s} \times 2^{\color{aquamarine}M} \times (1 + {\color{pink}\alpha})\]

Smallest fractional diff: \(2^{-52} \approx 2.2 \times 10^{-16}\)
17 digits to uniquely specify a result

Addition Associativity

What is \(10^{-16} + 1 - 1\)?

\[\begin{aligned} (10^{-16} + 1) - 1 &= 0 \\ 10^{-16} + (1 - 1) &\equiv 10^{-16} \end{aligned}\]

Residuals below \(2\times10^{-16}\) may be lost.

More Addition Examples

Let \(s = 1 + 2 \times 10^{-16}\). What is \((s + 1) - 1\)?

\[\begin{aligned} s + 1 &= 2 \\ (s + 1) - 1 &= 1 \neq s \end{aligned}\]

Manipulation of \(s\) shifted the least resolvable value.

Multiplication associativity

If \(a = b = 1.5\), and \(c = 1 + 2^{-52}\), then

\[\begin{aligned} (a \times b) \times c &\equiv 2.25 + 2^{-51} \\ a \times (b \times c) &\equiv 2.25 + 2^{-50} \end{aligned}\]

(Actual results depend on rounding rules)

Sample program

program rounding
  use iso_fortran_env, only : real64
  implicit none

  real(kind=real64) :: a, b, c

  a = 1.5
  b = 1.5
  c = 1.0000000000000002_real64

  print '(a, es23.17)', "(a * b) * c = ", (a * b) * c
  print '(a, es23.17)', "a * (b * c) = ", a * (b * c)
end program rounding

Integrity of parentheses

V&V requires integrity of parentheses

GCC Fortran

gfortran -fprotect-parens ...    # default
gfortran -Ofast ...              # Sets -fno-protect-parens

Intel Fortran

ifort -assume protect-parens     # Not default

Note: Fortran requires this!

Division Performance

Minimize division operations:

x = a / b / c           ! Bad
x = a / (b * c)         ! Good

y = 1. / (1. + 1./c)    ! Bad
y = c / (c + 1.)        ! Good

Store common divisions:

I_dx = 1.0 / dx
dudx = I_dx * (u(i+1) - u(i))
dvdx = I_dx * (v(i+1) - v(i))

Divisions are slower and more unpredictable

Parallel Summation

How to compute reproducible means or global sums?

Enforce ordering

\[\sum{\phi} = (\phi_1 + (\phi_2 + ( \phi_3 + ... )))\]
Fixed-precision arithmetic

Associative Scaling

Recall the floating point format

\[\phi \equiv (-1)^{\color{yellow}s} \times 2^{\color{yellow}M} \times (1 + {\color{yellow}\alpha})\]

Power-of-two multiplication is associative

\[2^N \times \phi \times 2^{-N} \equiv \phi\]

Dimension Scaling

Fields rescaled by dimensions should be invariant

\[\begin{aligned} u^{n+1} &= u^{n} + \Delta t \times \mathcal{F} \\ {\color{yellow}2^{L-T}} u^{n+1} &= {\color{yellow}2^{L-T}} u^{n} + {\color{yellow}2^T} \Delta t \times {\color{yellow}2^{L - 2T}} \mathcal{F} \end{aligned}\]

Dimensional factors

Unit	Scaling	Name
s	T	Time
m	L	Horizontal length
m	H	Layer thickness
m	Z	Vertical length
kg/m3	R	Density
J/kg	Q	Enthalpy

Defining Dimensions

Input parameters

call get_param(... , "DT", ... , scale=US%s_to_T)

Explicit constants

eps_vel = 1.0e-10 * US%m_s_to_L_T
ustar = 0.01 * US%m_to_Z * US%T_to_s

Diagnostic registration

call register_diag_field(..., "u", ... , conversion=US%L_T_to_m_s)

Index Rotation

Rotation Invariance

Solutions must be invariant to index rotation, e.g.:

\[\phi(i',j') = \phi(j, N-i)\]

Both fields and coordinates are remapped.

Note: \(u\) and \(v\) are velocities along \(i\) and \(j\)!

Rotational Consistency

beta_topo_x = -CS%MEKE_topographic_beta * FatH * 0.5 * ( &
              (G%bathyT(i+1,j)-G%bathyT(i,j)) * G%IdxCu(I,j)  &
          / max(G%bathyT(i+1,j),G%bathyT(i,j), GV%H_subroundoff) &
      +       (G%bathyT(i,j)-G%bathyT(i-1,j)) * G%IdxCu(I-1,j) &
          / max(G%bathyT(i,j),G%bathyT(i-1,j), GV%H_subroundoff) )

beta_topo_y = -CS%MEKE_topographic_beta * FatH * 0.5 * ( &
              (G%bathyT(i,j+1)-G%bathyT(i,j)) * G%IdyCv(i,J)  &
          / max(G%bathyT(i,j+1),G%bathyT(i,j), GV%H_subroundoff) + &
              (G%bathyT(i,j)-G%bathyT(i,j-1)) * G%IdyCv(i,J-1) &
          / max(G%bathyT(i,j),G%bathyT(i,j-1), GV%H_subroundoff) )

Index rotation ensures directional consistency

Invariant stencils

\(\phi^{(c)}_{i,j} = \frac{1}{4} (\phi_A + \phi_B + \phi_C + \phi_D)\)

	\(\frac{1}{4} ( (\phi_A + \phi_B) + (\phi_C + \phi_D) )\)
	\(\frac{1}{4} ( (\phi_A + \phi_D) + (\phi_B + \phi_C) )\)

Rotational ordering

When all else fails, reorder the algorithm:

subroutine advect_tracer(...)
   ! ...
   x_first = modulo(turns, 2) == 1
   if (x_first) then
      call advect_x(...)
      call advect_y(...)
   else
      call advect_y(...)
      call advect_x(...)
   endif
end subroutine advect_tracer

Summary

MOM6 test suite:
- Verification of design requirements
  - Universal invariance rules
- Validation of solutions
  - Site-specific regression tests
Bit reproducibility is essential, and achievable!
Over 40 bugs have been detected and fixed

Dimensional scaling example

https://github.com/NOAA-GFDL/MOM6/pull/921

Kd_lay(i,j,k-1) = Kd_lay(i,j,k-1) + 0.5**KS_extra(i,K)
Kd_lay(i,j,k)   = Kd_lay(i,j,k)   + 0.5**KS_extra(i,K)

\(\ldots + \left(\tfrac{1}{2}\right)^{\kappa_S}\)?

Rotational example

https://github.com/NOAA-GFDL/MOM6/pull/1050

subroutine thickness_diffuse_full
   !...
   Work_u(I,j) = Work_u(I,j) + G_scale * (...)
   Work_v(i,J) = Work_v(i,J) - G_scale * (...)
   !...
end subroutine thickness_diffuse_full

Indexing example

Assumed 1-based start index

subroutine register_time_deriv(...)
   real, dimension(:,:,:), target :: f_ptr
   real, dimension(:,:,:), target :: deriv_ptr
   ! ...
end subroutine register_time_deriv

Fails for 0-based symmetric memory grids!

Another indexing example

78d2dc3ee9a018f30bc666bd574e21fb7786403d

Extended domain to accommodate symmetric grids:

do J=js,je ; do i=is,ie
   h_vel = 0.5*((htot_fast(i,j) + htot_fast(i+1,j)) + h_neglect)
   uDml_diag(I,j) = uDml_diag(I,j) / (0.01*h_vel) * G%IdyCu(I,j) * (PSI(0.)-PSI(-.01))
enddo ; enddo

do J=js,je ; do i=is-1,ie
   h_vel = 0.5*((htot_fast(i,j) + htot_fast(i+1,j)) + h_neglect)
   uDml_diag(I,j) = uDml_diag(I,j) / (0.01*h_vel) * G%IdyCu(I,j) * (PSI(0.)-PSI(-.01))
enddo ; enddo

Development Guidelines

Use parentheses!
1. Are they honored?
2. Am I preserving residuals?
Use reproducing_sum()
1. Even better: Don't do global sums!
Assign dimensions
Use rotationally invariant stencils
Test early and often