The MOM6 Development Cycle

Marshall Ward

NOAA GFDL

2023-11-09

MOM6 Ocean Model

MOM6 sensitivity

Minor deviation leads to \(O(1)\) difference

Reproducibility Rules

  • Numerical expressions must be bit-reproducible
  • Code must not change existing solutions
  • Changes must not disrupt existing research

What we Don't require

When can answers differ?

  • Aggressive optimization (-O3 -mavx ...)
  • Hardware (Intel, AMD, A64FX, ...)
  • Compilers (gcc, ifort, ...)
  • Libraries (libm.a, ...)

But we still try to maximize reproducibility.

Float Order-of-Operations

What is \(10^{-16} + 1 - 1\)? Ambiguous!

\[\begin{aligned} (10^{-16} + 1) - 1 &\equiv 0 \\ 10^{-16} + (1 - 1) &= 10^{-16} \end{aligned}\]

Residuals below ULP \(\left(2 \times 10^{-16}\right)\) are lost.

Use parentheses to set the order of operations

Transcendentals

How is sin(x) computed? Ambiguous!

\[f(48^\circ) = 2 \Omega \sin \left( \frac{48 \pi}{180} \right)\]

glibc 2.22: 0.108381727637274115E-03 (3F1C695FE71A3FE4)
      2.26: 0.108381727637274128E-03 (3F1C695FE71A3FE5)

Avoid transcendentals where possible, manage dependencies when necessary.

Global Summation

The order of sum() is ambiguous!

Parentheses work, but may have cumulant errors:

\[\sum{\phi} = \phi_1 + (\phi_2 + ( \phi_3 + ... ))\]

Sums are stored in fixed-precision using six integers:

image

GFDL PR life Cycle

image

  1. User submits to node by PR
  2. Automated verification testing
  3. Code review (by a human)
  4. Test for regressions on production machine
  5. Rebase into node codebase

Verification Testing

image

CI testing is platform independent

State of the Model

Global energy tracked to full precision

Step   Days   Energy/Mass [m2 s-2]     Mean Sea Level [m]   ...

   0   0.00   7.2161166068132286E-27   1.8190E-12           ...
  12   0.50   2.7781004671136538E-04   1.1369E-12           ...
  24   1.00   2.7734897826598717E-04   1.8190E-12           ...

Diagnostics use multiple metrics

u-point: ocean_model-u
  min  = -6.7187595818683776E-03  max  =  3.3480219779204019E-02
  mean =  1.1239682303793666E-04  bits = 21851
v-point: ocean_model-v
  min  = -8.3469699425156359E-03  max  =  6.8420831486068704E-03
  mean =  1.2076392816784489E-03  bits = 18606
h-point: ocean_model-h
  min  =  9.9999999999999915E-04  max  =  5.6265092225099863E+02
  mean =  3.6490088139048595E+02  bits = 18673
...

Example Tests

"Local regression"

New code does not (inadvertently) change answers

Parallel Layout

1 × 2 and 2 × 1 domain decomponsitions

Restart/Pickup

One \(\Delta t\) run must equal two \(\tfrac{1}{2}\Delta t\) runs

Aggressive initialization

NaN-initialization arrays vs. uninitialized arrays

Dimension Test

Dimensionally correct equations are invariant to scaling:

\[\begin{aligned} u^{n+1} &= u^{n} + \Delta t \times \mathcal{F} \\ \color{yellow}{2^{L-T}} u^{n+1} &= \color{yellow}{2^{L-T}} u^{n} + \color{yellow}{2^T} \Delta t \times \color{yellow}{2^{L - 2T}} \mathcal{F} \end{aligned}\]

Solutions should also be invariant, e.g. \(L \rightarrow 2^N L\)

Floating Point Review

image

\[\phi = (-1)^{\color{yellow} s} \times 2^{\color{aquamarine} M} \times (1 + {\color{pink} \alpha})\]

  • Smallest fractional diff: \(\color{yellow}{2^{-52} \approx 2.2 \times 10^{-16}}\)
  • 17 digits to uniquely specify a result

Shallow Water Example

\[\begin{aligned} u_t + u u_x + v u_y &= -g h_x \\ v_t + u v_x + v v_y &= -g h_y \\ h_t + h u_x + h v_y &= 0 \\ \end{aligned}\]

Dimensions:
  • \(L\) (horiz. length)
  • \(T\) (time)
  • \(H\) (layer depth)
Invariants:
  • \(\left[ u, v \right] = L T^{-1}\)
  • \(\left[ g \right] = L^2 H^{-1} T^{-2}\)

Rotation Test

Equations should be invariant to rotation

image image image image

Perhaps should be called an "index" rotation

image

Invariant stencils

\(\phi^{(c)}_{i,j} = \frac{1}{4} (\phi_A + \phi_B + \phi_C + \phi_D)\)

image

\(\frac{1}{4} ( ({\color{LightCoral}{\phi_A}} + {\color{LightCoral}{\phi_B}}) + ({\color{LightSkyBlue}{\phi_C}} + {\color{LightSkyBlue}{\phi_D}}) )\)

\(\frac{1}{4} ( ({\color{LightCoral}{\phi_A}} + {\color{LightSkyBlue}{\phi_C}}) + ({\color{LightCoral}{\phi_B}} + {\color{LightSkyBlue}{\phi_D}}) )\)

image \(\frac{1}{4} ( ({\color{LightCoral}{\phi_A + \phi_D}}) + ({\color{LightSkyBlue}{\phi_B + \phi_C}}) )\)

Descriptive Commits

commit d210cc6cdfd03150306c8ba41612e3380d66c281
Author: Robert Hallberg <Robert.Hallberg@noaa.gov>
Date:   Wed Aug 16 17:11:35 2023 -0400

    +Remove build_grid_arbitrary
    
      Removed the unused (and unusable) routine build_grid_arbitrary.
      This routine could not have been used because it had a hard-coded
      STOP call, and comments in it indicated that it should have been
      deleted in July, 2013.  The run-time parameter setting that would
      have triggered a call to this routine has been retained for now,
      but with a fatal error message explaining that this routine has
      not been implemented.   All answers are bitwise identical in any
      cases that ran before.

commit 615e57f854db8be8c75a9edba6bb05e3f04a6eb7
Author: raphael dussin <raphael.dussin@gmail.com>
Date:   Sat Oct 28 15:09:45 2023 -0400

    extension to the internal tides module (#481)
    
    the module in now able to read in tidal velocities for different
    tidal harmonics and distribute the energy and distribute TKE input over
    the different vertical modes. This involves upsizing dimensions of
    several arrays and mofiying some API. internal_tide_input_CS is
    promoted to public to facilitate the passing of energy input to
    MOM_internal_tides

commit ddb88f8c2fb36ce282cfdb34739a1c37ed369abd
Author: Cory Spencer Jones <spencerjones@tamu.edu>
Date:   Mon Oct 16 11:33:26 2023 -0500

    +Add timestamp and directory to particles restart
    
    The directory, time and timestamp variables are needed by the
    particle code in order to write better restart files. I have changed
    the particles_save_restart interface to add these variables. I have
    also removed the option to pass temperature and salinity to
    particles_save_restart, because these variables are not useful for
    restart.

commit 503a9f4c5f585e258a3d5810cad0b4af073c4fb8
Author: Alex Huth <huthalexandere@gmail.com>
Date:   Fri Oct 27 06:59:36 2023 -0400

    ice shelf front advection: When determining a reference thickness
    for a partially-filled cell, add the reference thickness
    contribution from a neighboring filled cell proportionate to its
    flux into the partially-filled cell. This is more accurate than
    simply taking the average thickness of all neighboring filled cells.
    Also fixed incorrect bounds. (#475)

commit f514529a8a299b8e84512a10062aa524f0a23478
Author: Alex Huth <huthalexandere@gmail.com>
Date:   Thu Oct 26 15:11:12 2023 -0400

    Ice sheet thickness boundary condition (#474)
    
    * allow for assigned ice shelf thickness where hmask==3, but still
      solve for ice sheet velocity

Also: No "fixed my typo" commits!

image

Commit messages to me are almost as important as the code change itself.

If you can explain the code change to me, I will trust the code.

Linus Torvalds, Linux OSS 2020

Regression Suite

image

Validate 61 tests using 3 compilers

Tangled History

image

Having many active users leads to a non-sequential history

This severely complicates our ability to track bugs and regressions!

Rewrite History!

image

We aggressively rebase to create a new linear history

However, note that we also clobber user history

We cannot clobber external research codebase!

How to preserve existing runs?

MOM6 Consortium

image

What is the MOM6 Consortium?

image

Codebase is governed by a consortium of research groups.

All changes to the "hub" must be tracked and preserved.

Groups manage their own branch, and contribute to main.

Code Continuity

image
  • Merges to main are unanimous
  • All branches within a node are preserved
  • Nodes can easily collaborate on joint projects

Summary

Bit reproducibility?

Restrict model to non-ambiguous operations

Preserve existing solutions?

Regression and self-consistency testing

Code preservation?

Mutual governance by a consortium of groups