NOAAGFDL / Princeton U.
23 Sept 2021
Reproducible floating point arithmetic is essential
Step Days Energy/Mass [m2 s2] Mean Sea Level [m] ...
0 0.00 7.2161166068132286E27 1.8190E12 ...
12 0.50 2.7781004671136538E04 1.1369E12 ...
24 1.00 2.7734897826598717E04 1.8190E12 ...
Based on global metrics (energy, mass, etc)
upoint: ocean_modelu
min = 6.7187595818683776E03 max = 3.3480219779204019E02
mean = 1.1239682303793666E04 bits = 21851
vpoint: ocean_modelv
min = 8.3469699425156359E03 max = 6.8420831486068704E03
mean = 1.2076392816784489E03 bits = 18606
hpoint: ocean_modelh
min = 9.9999999999999915E04 max = 5.6265092225099863E+02
mean = 3.6490088139048595E+02 bits = 18673
...
Min, max, mean, bit count for every diagnostic
Test  Description 

grid  Symmetric/Asymmetric memory grids 
layout  1×1 and 2×1 domain decomposition 
restart  Restart at midrun 
dimension  Dimensional scaling 
rotation  Index rotation 
openmp  OpenMP (singlethread) 
\[\phi = (1)^\color{yellow}{s} \times 2^\color{aquamarine}{M} \times (1 + \color{pink}{\alpha})\]
What is \(10^{16} + 1  1\)?
\[\begin{aligned} (10^{16} + 1)  1 &\equiv 0 \\ 10^{16} + (1  1) &= 10^{16} \end{aligned}\]
Residuals below ULP (\(2\times10^{16}\)) are lost.
Let \(s = 1 + 2 \times 10^{16}\). What is \((s + 1)  1\)?
\[\begin{aligned} s + 1 &= 2 \\ (s + 1)  1 &= 1 \neq s \end{aligned}\]
Manipulation of \(s\) shifts ULP to \(4 \times 10^{16}\).
If \(a = b = 1.5\), and \(c = 1 + 2^{52}\), then
\[\begin{aligned} (a \times b) \times c &\equiv 2.25 + 2^{51} \\ a \times (b \times c) &\equiv 2.25 + 2^{50} \end{aligned}\]
Ambiguous ordering is rejected:
\[x + y + z\]
Parentheses are required:
\[\begin{aligned} (x + y) + z \\ x + (y + z) \end{aligned}\]
Select to optimize accuracy and performance.
GCC Fortran
gfortran fprotectparens ... # default
gfortran Ofast ... # Sets fnoprotectparens
Intel Fortran
ifort assume protectparens # Not default
Parentheses work, but have cumulant errors:
\[\sum{\phi} = (\phi_1 + (\phi_2 + ( \phi_3 + ... )))\]
Fixedprecision is accurate and independent of order:
The sum()
intrinsic is ambiguous!
How is sin(x)
computed? Ambiguous!
\[f(48^\circ) = 2 \Omega \sin \left( \frac{48 \pi}{180} \right)\]
glibc 2.22: 0.108381727637274115E03 (3F1C695FE71A3FE4)
2.26: 0.108381727637274128E03 (3F1C695FE71A3FE5)
Other compilers may not even use libm!
We avoid transcendentals where possible, and manage dependencies when necessary.
How to evaluate \(z^6\)?
These forms are ambiguous:
z6 = z * z * z * z * z * z
z6 = z**6
Compilers may use libm pow()
, also ambiguous.
We recommend:
z3 = z * z * z
z6 = z3 * z3
(Why not (z * z) * z
?)
Although \(0 = 0\), bit count will differ.
\[\begin{aligned} \phi \times 0 = \begin{cases} 0 & \text{if $\phi \geq 0$} \\ 0 & \text{if $\phi < 0$} \end{cases} \end{aligned}\]
\(0 \Leftrightarrow 0\) transitions can detect unexpected values.
But if unresolvable, add a zero: \(0 + 0 \Rightarrow +0\).
(e.g. min()
intrinsic)
From the seawater equation of state:
intz(m) = &
g_Earth * dz * ((p0 + p_ave) * (I_Lzz * I_al0)  rho_ref_mks) &
 2.0 * eps * I_Rho * (lambda * I_al0**2) * eps2 * ( &
C1_3 + eps2 * (0.2 + eps2 * (C1_7 + C1_9 * eps2)) &
)
Can we ensure expressions are dimensionally correct?
Recall the floating point format
\[\phi = (1)^{\color{yellow}s} \times 2^{\color{yellow}M} \times (1 + {\color{yellow}\alpha})\]
Poweroftwo multiplication is associative
\[2^\color{yellow}{N} \times \phi \times 2^\color{yellow}{N} = \phi\]
This provides a basis for bitwise dimensional scaling.
Fields rescaled by dimensions should be invariant
\[\begin{aligned} u^{n+1} &= u^{n} + \Delta t \times \mathcal{F} \\ \color{yellow}{2^{LT}} u^{n+1} &= \color{yellow}{2^{LT}} u^{n} + \color{yellow}{2^T} \Delta t \times \color{yellow}{2^{L  2T}} \mathcal{F} \end{aligned}\]
\[\begin{aligned} u_t + u u_x + v u_y &= g h_x \\ v_t + u v_x + v v_y &= g h_y \\ h_t + h u_x + h v_y &= 0 \\ \end{aligned}\]
Dimensions:

Invariants:

Unit  Scaling  Name 

s  T  Time 
m  L  Horizontal length 
m  H  Layer thickness 
m  Z  Vertical length 
kg/m3  R  Density 
J/kg  Q  Enthalpy 
Input parameters
call get_param(... , "DT", ... , scale=US%s_to_T)
Explicit constants
eps_vel = 1.0e10 * US%m_s_to_L_T
ustar = 0.01 * US%m_to_Z * US%T_to_s
Diagnostic registration
call register_diag_field(..., "u", ... , conversion=US%L_T_to_m_s)
"US" contains userdefined poweroftwo rescalings.
PFu(I,j,k) = (((za(i,j)*dp(i,j) + intp_dza(i,j,k))  &
(za(i+1,j)*dp(i+1,j) + intp_dza(i+1,j,k))) + &
((dp(i+1,j)  dp(i,j)) * intx_za(I,j)  &
(p(i+1,j,K)  p(i,j,K)) * intx_dza(I,j,k)) ) * &
(2.0*G%IdxCu(I,j) / ((dp(i,j) + dp(i+1,j)) + dp_neglect))
PFv(i,J,k) = (((za(i,j)*dp(i,j) + intp_dza(i,j,k))  &
(za(i,j+1)*dp(i,j+1) + intp_dza(i,j+1,k))) + &
((dp(i,j+1)  dp(i,j)) * inty_za(i,J)  &
(p(i,j+1,K)  p(i,j,K)) * inty_dza(i,J,k))) * &
(2.0*G%IdyCv(i,J) / ((dp(i,j) + dp(i,j+1)) + dp_neglect))
Can we ensure consistency of expressions like this?
Rotate input fields, forcing, coordinates.
(x,y)
and (u,v)
describe first and second index
Read inputs on coupler grid Move to rotated MOM6 grid Derotate fields sent back to coupler and output 
For 90° rotations:
Scalar 

Array pair 

Vector 

All quarterturn rotations are supported.
\(\phi^{(c)}_{i,j} = \frac{1}{4} (\phi_A + \phi_B + \phi_C + \phi_D)\)
\(\frac{1}{4} ( (\color{LightCoral}{\phi_A} + \color{LightCoral}{\phi_B}) + (\color{LightSkyBlue}{\phi_C} + \color{LightSkyBlue}{\phi_D}) )\) \(\frac{1}{4} ( (\color{LightCoral}{\phi_A} + \color{LightSkyBlue}{\phi_C}) + (\color{LightCoral}{\phi_B} + \color{LightSkyBlue}{\phi_D}) )\) 

\(\frac{1}{4} ( (\color{LightCoral}{\phi_A + \phi_D}) + (\color{LightSkyBlue}{\phi_B + \phi_C}) )\) 
subroutine advect_tracer(...)
! ...
x_first = modulo(turns, 2) == 1
if (x_first) then
call advect_x(...)
call advect_y(...)
else
call advect_y(...)
call advect_x(...)
endif
end subroutine advect_tracer
When all else fails, reorder the algorithm.
Kinetic energy calculation:
tmp1(i,j,k) = &
0.25 * KE_scale_factor * (areaTm(i,j) * h(i,j,k))) &
* (u(I1,j,k)**2 + u(I,j,k)**2 + v(i,J1,k)**2 + v(i,J,k)**2)
tmp1(i,j,k) = &
(0.25 * KE_scale_factor * (areaTm(i,j) * h(i,j,k))) &
* ((u(I1,j,k)**2 + u(I,j,k)**2) + (v(i,J1,k)**2 + v(i,J,k)**2))
Salinity contribution to diffusivity:
Kd_lay(i,j,k1) = Kd_lay(i,j,k1) + 0.5**KS_extra(i,K)
Kd_lay(i,j,k) = Kd_lay(i,j,k) + 0.5**KS_extra(i,K)
\(\ldots + \left(\tfrac{1}{2}\right)^{\kappa_S}\)?
subroutine thickness_diffuse_full
! ...
Work_u(I,j) = Work_u(I,j) + G_scale * (...)
Work_v(i,J) = Work_v(i,J)  G_scale * (...)
!...
end subroutine thickness_diffuse_full
In general, we value reproducibility over speed.
sum()
, sin()
, ...