05-阅读：Find_idea_of_DA

No tag

数据同化

Publish Date: 2022-03-02

Update Date: 2024-07-29

Word Count: 10.7k

`<center>`参考资料：用于3D变分数据同化的基于注意力的卷积自动编码器-2020 `</center>`

英文名：Attention-based Convolutional Autoencoders for 3D-Variational Data Assimilation
期刊：《Computer methods in applied mechanics and engineering》，LetPub，https://www.letpub.com.cn/index.php?page=journalapp&view=search

这个期刊也不错~~

8.0，171，9.9，2区
作者：Julian Mack, Rossella Arcucci, Miguel Molina-Solana, Yi-Ke Guo
机构：Data Science Institute, Imperial College London, UK
这一篇是在2022-02-28 看的，对3D-VAR进行的改进，实现B矩阵的神经网络训练？~

Abstract

We propose a new ‘Bi-Reduced Space’ approach to solving 3D Variational Data Assimilation using Convolutional Autoencoders. We prove that our approach has the same solution as previous methods but has significantly lower computational complexity; in other words,we reduce the computational cost without affecting the data assimilation accuracy.

We tested our proposal with data from a real-world application: a pollution model of a site in Elephant and Castle (London, UK) and found that we could

(1) reduce the size of the background covariance matrix representation by O(10 3  ), and

reduce by, increase by
(2) increase our data assimilation accuracy with respect to existing reduced space methods.

Acknowledgements

This work is supported by the EPSRC, UK Grand Challenge grant “Managing Air for Green Inner Cities”(MAGIC) EP/N010221/1, by the EPSRC, UK Centre for Mathematics of Precision Healthcare EP/N0145291/1 and the EP/T003189/1 Health assessment across biological length scales for personal pollution exposure and its mitigation (INHALE). Thanks to Dr. Laetitia Mottet for the set up of the full model in Fluidity. M. Molina-Solana was supported by European Union’s H2020 MSCA-IF (ga. No.743623) and Athenea3i (ga. No.754446) programmes.

9. Conclusions and future work

We have presented a new Bi-reduced space 3D-VarDA formulation and show that, in combination with the Zhou et al or ‘Tucodec’ image compression CAE, this method gives superior data assimilation performance in comparison with reduced space VarDA regardless of the parameters used in the latter case. We have demonstrated that our method is also faster in the majority of scenarios. On the theoretical side, we show that our method produces approximately equivalent solutions to the traditional method at lower computational complexity. Unlike the previous approach which is in O(M 2 ) for large M, our method does not penalise the collection of more observation data.

We have released our work in a well tested Python module VarDACAE.

↑在 python 的 module 上发布的

There were many extensions to this work which we would have liked to explore further. We feel that the most important of these is the validation of our hypothesis that is possible to create an observation encoder network f o  to calculate the latent misfits d l  . We would also have liked to apply our approach to 4D-VarDA, validate it on other data sets and investigate alternatives to the L-BFGS minimisation routine. A more substantial extension would involve integrating our method with CAE-based ROM approaches to produce a single end-to-end network for reduced space data assimilation and we believe this would be complemented by the use of data assimilation localisation techniques [66]. Finally, there is also potential for the use of VAEs within the proposed system to enforce orthogonality in the CAE latent dimension.

github code

VarDACAE：https://github.com/scheng1992/Data_Assimilation

Readme.md

—VarDACAE

This module is used to create Convolutional AutoEncoders for Variational Data Assimilation. A user can define, create and train an AE for Data Assimilation with just a few lines of code. It is the accompanying code to the paper here, published in Computer Methods in Applied Mechanics and Engineering.

—Introduction

Data Assimilation (DA) is an uncertainty quantification technique used to reduce the error in predictions by combining forecasting data with observation of the state. The most common techniques for DA are Variational approaches and Kalman Filters.

↑ DA介绍

In this work, we propose a method of using Autoencoders to model the Background error covariance matrix, to greatly reduce the computational cost of solving 3D Variational DA while increasing the quality of the Data Assimilation.

↑通过自动编码器对背景误差协方差进行建模

—Data

The data used in this paper is owned by the Data Science Institute, Imperial College, London. If you do not have access to this data, please see the section below on training a model with your own data.

—Installation

Install vtk by navigating to this link and installing the version applicable to your system.
Navigate to the base directory and run:
```
pip install -e .
```
Run pytest from the home directory to ensure correct installation.

—Tests

From the project home directory run pytest.

—Getting Started

—Settings Instance

—Train a model on your own data

1 Introduction

Data Assimilation (DA) is an uncertainty quantification technique in which observation data and a forecasting model are used in tandem to generate predictions that are more accurate than those that would be produced using either component independently.

↑ DA的定义

DA is computationally costly for large systems [1] and under operational constraints, it is often necessary to solve the problem in a reduced space in order to achieve real-time assimilation.

↑ operational 可以翻译成业务化吧

In most relevant DA operational software, a variable transformation is performed on the variational functional to reduce the computational cost needed for computing the covariance matrix explicitly; to reduce the space, only Empirical Orthogonal Functions (EOFs) of the first largest eigenvalues of the error covariance matrix are considered.

↑ 变量变换，经验正交函数(EOF)

Since its introduction to meteorology by Edward Lorenz, EOFs analysis, which is essentially based on a Truncated Singular Value Decomposition (TSVD), has become a fundamental tool in computational fluid dynamic modelling for data diagnostics and dynamical model reduction.

↑ EOF分析基于阶段奇异值分解（TSVD）

Real world applications of TSVD (EOFs) basically exploit the fact that these methods allow a decomposition of a data function into a set of orthogonal functions, which are designed so that only a few of these functions are needed in lower-dimensional approximations.

↑ TSVD的原理：降维

exploit 利用

Nevertheless, the accuracy of the solution obtained by truncating, exhibits a high sensitivity to the variation of the value of the truncation parameter [2,3], so that a suitable truncation parameter is needed. This is a severe drawback of truncation-based methods and limits the utility of operational software based on these methods.

↑ TSVD的缺点，结果对truncation参数极其敏感
Convolutional Autoencoders (CAEs) are a type of neural network that can be used to compress inputs that exhibit local smoothness. CAEs have had huge successes in computer vision [4,5] and particularly in image compression [6,7].

↑ 卷积自编码（CAE）的原理：压缩具有局部光滑的输入。

In this work, we use CAEs to produce a reduced space in which DA can be performed efficiently. We show that our approach has lower online complexity than a DA with TSVD while also giving equivalent forecasts.

↑ DA-CAE 和 DA-TSVD 对比

The structure of this paper is as follows:

in Section 2 we cover related work and we present the contribution of the present work.
Section 3 provides preliminary concepts and definitions,
Section 4 provides background on Autoenconders, and Section 5 introduces our theoretical contribution.
As the success of our approach is heavily conditioned on the choice of a CAE architecture, in Section 6 we summarise the results of our extensive architecture search before evaluating our approach against existing VarDA methods in Section 7.
The paper ends with a discussion in Section 8 and some conclusions and further remarks in Section 9.

Forecasting models introduce uncertainty from numerous sources. These include, but are not limited to, uncertainty in initial conditions, imperfect representations of the underlying physical processes and numerical errors. As a result, a model without access to real-time data will accumulate errors until its predictions no longer correspond to reality [8]. Similarly, all observations will have an irreducible uncertainty as a result of imperfect measuring devices.

↑ 预测模型的不确定性包括...

The key idea in DA is that the overall uncertainty in a forecast can be reduced by producing a weighed average of model forecasts and observations.

↑ QT：DA 中的 a weighed average 怎么理解？

The canonical application of Data Assimilation (DA) [9,10] is Numerical Weather Prediction (NWP) [11–13] but the technique has been utilised in contexts as diverse as oceanic modelling [14,15], solar wind prediction [16] and inner city pollution modelling [17,18].

Our proposed approach is agnostic to the details of the forecasting model (i.e. it is non-intrusive) and is therefore applicable to any DA problem in which a reduced order system is used.

↑ non-intrusive 非侵入性的怎么理解？

Our proposed formulation of Variational DA [12,13] extends the incremental formulation [11]. In 1992, Parrish et al. [19] proposed using a Control Variable Transform (CVT) to reduce the space of the background error covariance matrix B by performing Cholesky factorisation as B = V V T  . Since then, many authors have used eigenanalysis techniques such as PCA or TSVD to reduce the rank of V [17]. In this work, we propose replacing these eigenanalysis approaches with a CAE that learns to compress V more efficiently, and with less information-loss than the removal of eigen-modes.

↑ 本文的创新点，使用CAE更有效的压缩V

This work builds on a previous publication [17] in which TSVD was used to precondition V. The original authors used a test-site location in South London and synthetic data generated by Fluidity, an open-source finite-element fluid dynamic software (http://fluidityproject.github.io/). We test our approach on the same domain and data to enable a clear comparison. We find that our method gives considerably more accurate predictions and, in most cases, provides them sooner than the previous approach. In fact, our method is also more accurate (and much faster) than the CVT formulation of Parrish et al. [19].

In this paper we make the following contributions:

We propose a new ‘Bi-reduced space’ 3D Variational Data Assimilation (3D-VarDA) formulation that has an online complexity that is independent of the number of assimilated observations (i.e. it can be used with arbitrarily dense sensor networks). We show that our approach has lower online complexity than [17] while also giving equivalent forecasts.
We create and evaluate 3D extensions of a range of state-of-the-art CAEs for 2D image compression. To our knowledge, we are the first to extend the image compression network of [20] and image restoration GRDN of [21] to three-dimensions. We find that Zhou et al.’s attention-based model [20] performs best, and make some small improvements to this system including the replacement of vanilla residual blocks [22] with ‘NeXt’ residual blocks [23] in order to reduce decoder inference time.
This adapted CAE, in combination with our proposed DA formulation, achieves a substantial relative reduction in DA error of 37% compared with Arcucci et al.’s TSVD approach [17]. Depending on the number of assimilated observations, our method is up x30 faster. We discuss the speed–accuracy tradeoff at length in Section 7.
We release a well tested open-source Python module VarDACAE that enables users to easily replicate our experiments, use our model implementations, and train CAEs for any Variational data assimilation problem. The repository can be found at https://github.com/julianmack/Data Assimilation.

`<center>`参考资料：基于数据同化的地下水模型不确定性分析 `</center>`

期刊：《Journal of Glaciology and Geocryology》，

这个期刊查不到
作者：陈冲
机构：中国石油大学（北京）
创新点：数据同化的不确定分析

摘要

黑河流域中下游地下水系统受上游冰冻圈融水和降雨的补给，由气候变暖导致的冰冻圈萎缩致使中下游地下水系统的稳定性面临更多的风险。

↑ 地下水模型，研究意义

地下水模型是地下水系统稳定性评估的有效手段，但是地下水模型参数往往存在较大的不确定性。为此，本文提出了基于数据同化算法的不确定性分析方法，通过包含观测资料信息减小模型不确定性。

↑ DA，为什么应用在地下水模型

采用所提方法分析了（基于MODFLOW构建）黑河流域中游地下水模型中13个参数的不确定性，讨论了算法超参数的影响及其最优取值，分析了地下水模型参数的不确定性。

↑ DA，怎么应用在地下水模型

实验结果证明数据同化算法可有效减小地下水模型参数的不确定性，观测资料的种类与数量对参数不确定性的减小起到重要作用；不同地下水模型参数的不确定性不同，地表水与地下水相互作用频繁的区域参数不确定性较大；含水层渗透系数、含水层给水度以及灌溉回流系数对模型输出的地下水位输出影响显著，河床水力传导系数对模型输出的河流流量影响较大。

本研究将为地下水研究提供更加可靠的模型方法，为西北内流区地下水哺育的绿洲生态系统稳定可持续研究提供重要支撑。

结论

本文提出了基于数据同化的不确定性分析方法，通过包含观测资料信息减小模型参数的不确定性。基于已构建的黑河流域中游地下水模型，利用提出的算法对模型参数进行不确定性分析，探索了不同超参数对算法效果的影响，分析了含水层渗透系数、河床水力传导系数、含水层给水度、灌溉回流系数的不确定性，评估了不同系数对观测信息的敏感程度，主要结论如下：

（1）基于贝叶斯理论的数据同化方法是不确定性分析的有效手段，ESMDA-EnKF算法通过包含观测资料更新参数，能够有效的减小模型参数的不确定性；对不确定性分析结果的统计分析结果表明，经过 ESMDA-EnKF 算法更新后的参数均值收敛到最优均值附近，不确定性范围明显减小。

（2）不同超参数对算法效果影响不一，观测资料的增加将提升算法对模型参数的更新程度，减小参数的不确定性，且加入新类型的观测资料会进一步减小参数的不确定性。

（3）不同模型参数的不确定性不同，地表水与地下水相互作用频繁的区域参数不确定性较大；含水层渗透系数、含水层给水度以及灌溉回流系数对模型输出的地下水位影响显著，水力传导系数对模型输出的河流流量影响较大。

0 引言

地下水系统模拟已经成为地球科学的基本研究方法，在地球系统科学领域被誉为第二次哥白尼革命［1］，被证明是最有价值和最实用的工具之一［2-3］。地下水模型能够刻画水文过程的整体和局部行为；多次模拟，选择最优的设计；通过情景分析进行情景预测，提出应对策略。

围绕黑河中游地表水、泉水和地下水，大量学者利用模型进行了有益探索：陈冲等［4］建立了黑河中游地区地下水模型，并初步分析了上游径流与耕地面积对地下水资源的影响；王旭升等［5］总结了20年来黑河流域的地下水模型研究，并指出地下水模型应加强与关联过程（地表水、土壤水、水利工程）的集成；程国栋等［6］阐述了黑河流域生态—水文过程集成研究的进展及展望。

↑ 黑河

目前的地下水模型一般基于确定性的参数、边界条件，采用确定性机制来构建模型。然而，由于实际过程的复杂性及非线性因素，导致模型往往存在一定不确定性。正如George所言，所有关于真实系统的模型都是“错误”的［7］，即模型永远不能完美表达实际系统。地下水模型的不确定性来源一般可以分为：模型数据的不确定性、模型结构的不确定性以及模型参数的不确定性。

模型数据的不确定性一般指由于监测误差或者数据缺失导致的不确定性，只能通过提高监测技术和数据收集频率来降低。
而模型结构和参数不确定性则来源于建模过程、模型参数标定及验证过程。这些不确定性将对进一步的科学研究以及决策提出等带来诸多麻烦甚至风险。因此在使用地下水模型时，对其进行不确定性分析是十分必要的。

↑ 模型不确定性来源

William 等［8］介绍了用来估计、分析模型中所有类型不确定性的多种方法。Beven［9］于1989年在对物理模型的适用性进行讨论的过程中，首次提出了对水文模型进行不确定性分析的概念。地下水模型不确定性分析方法主要分为蒙特卡罗（Monte Carlo）法、矩方程法与贝叶斯法三种［10］。

虽然 MC 法是一种被广泛采用的不确定性分析方法［11-15］，但是 MC 法的缺点也不容忽视，其需对模型的参数进行大量的采样才能保证算法的正确性和准确性，无法适用于计算消耗大的模型。
矩方程法通过随机偏微分方程直接求解模拟结果的各阶统计矩，已在地下水模型不确定性研究中得到初步应用［16-18］。
贝叶斯法利用观测资料修正水文地质参数分布，其既可以用于模型参数识别反演［19］，也能够用于对参数进行不确定性分析，其优点是通过更新参数分布，以更加准确的评估模型不确定性。利用贝叶斯法进行地下水模型不确定性分析的研究相对较少［20-21］。

↑ 不确定性分析方法

数据同化（DA，Data Assimilation）是基于贝叶斯理论的参数更新方法。卡尔曼滤波算法（KF，Kalman Filter）是由 Kalman 提出的顺序数据同化算法［22］，其利用观测资料自回归地对模型的状态变量进行更新，并在更新的整个过程中保证状态变量估计值的误差最小。自提出以来，KF算法已被广泛应用于模型参数与状态的估计中。

黄春林等［23］基于集合卡尔曼滤波（EnKF，Ensemble Kal⁃ man Filter）利用土壤水分观测数据同化了简单生物圈模型（SiB2，Simple Biosphere Model 2），探讨了简单生物圈模型的单点土壤水分同化方案；
褚楠等［24］进一步采用双EnKF方法同时估计SiB2模型中土壤水分与土壤属性参数，提高了模型对土壤水分的估计精度。
Li等［25］实现了采用EnKF同时估计地下水模型的参数与状态变量。
Sly 等基于 EnKF 算法采用 SWOT（Surface Water and Ocean Topogra⁃ phy）模型的输出对 GHMs（Global Hydrological Models）进行了同化，提升了全球尺度水文模拟的精度。

↑ KF方法

然而，仅有少量研究基于数据同化方法进行模型不确定性分析。

例如，Li 等［25］采用了 ESMDA （Ensemble Smoother with Multiple Data Assimila⁃ tion）分析了焉耆盆地水文模型参数的不确定性。

本文基于前期工作中在黑河流域中游构建的地下水模型［4］，拟结合 ESMDA与 EnKF实现地下水模型参数的不确定性分析，探讨ESMDA与EnKF算法超参数对模型不确定性分析效果的影响，利用算法定量分析地下水模型参数的不确定性、对不同观测数据的敏感性以及不同参数的不确定性范围。

↑ 本文创新点

1 研究区域

2 研究方法

本文采用 ESMDA 与 EnKF 结合的方法 ESMDA-EnKF 对研究区地下水模型的参数进行不确定性分析。

图 2 给出了使用 ESMDA 算法进行不确定性分析的程序流程。

在进行不确定性分析之前，首先要对算法进行初始化，并对模型的参数进行设置，根据给定的概率分布（一般假定为正态分布或者对数正态分布）生成模型参数集合。
将模型参数预测值分别输入地下水模型中，计算得到模型输出值；
综合地下水模型输出、观测值、观测误差协方差以及参数预测值集合输入 ESMDA-EnKF 算法中，从模型输出值和观测值集合计算得到增量场，根据集合中样本的误差统计计算出卡尔曼增益，由增量场与卡尔曼增益计算得到预测结果的更新量，将更新量叠加到初始场得到分析值集合；
根据预先算法执行次数判定是否达到程序运行结束条件。

2.1 Ensemble Kalman Filter

EnKF 采用蒙特卡罗方法随机产生参数集合，对状态变量进行预测，并根据获取的观测信息对状态变量进行更新，已经有许多文献对其理论和具体算法做了详细的论述［37-38］。此处仅简要回顾一下 EnKF的主要的工作原理。

定义第 t 个时刻的参数和状态向量集合
从第 t 个时刻到 t+1 个时刻的预测步
第 t+1 个时刻的分析步

2.2 Ensemble Smoother with Multiple Data Assimilation

对 ESMDA 算法主要分为 3 个方面进行介绍，首先介绍集合平滑器（ES，Ensemble Smoother）；接下来在 ES 算法中引入膨胀系数，构成 ESMDA 算法框架；最后，介绍ESMDA与EnKF算法的结合，并给出伪代码的实现。

↑ 期待伪代码

2.2.1 Ensemble Smoother (ES)

EnKF 属于顺序数据同化算法，其根据 t 时刻状态变量值初始化模型，预测 t+1 时刻模型的状态变量，在 t+1 时刻利用观测资料对状态变量的预测值进行加权更新，得到当前时刻状态变量的最优估计值，多用于对状态变量进行实时标定，以实现实时模型。

↑ 对 ENKF 的顺序同化解释的非常之好，就是实时模型~

然而，ES多用于参数反演以及参数的不确定性分析中，采用所有可用观测资料对参数进行更新。公式如下：

↑ ES 是非顺序同化吗？

2.2.2 ESMDA

ES 的本质是将所有时刻的观测资料输入同化算法中进行一次参数更新［39］。而 ESMDA 在 ES 的基础上设定一个算法执行次数，进行多次数据同化。

↑ ES 和 ESMDA 的联系

在每个循环中，ESMDA 并不是简单重复了 ES 过程，而是在每个循环中对观测误差添加了一个膨胀系数 α i  ，并令

显然地， α i  有许多种取值方式，然而文献［40］指出，随着循环次数的增加而逐渐减小的膨胀系数对于同化效果并无明显提升。因此，在每次循环中，采用相同的膨胀系数。

2.2.3 ESMDA-ENKF

ESMDA-EnKF 在每次循环中采用 EnKF 对参数进行更新，其更新公式如下：

ESMDA**-EnKF的具体实现如下伪代码**所示：

2.3 地下水模型

黑河流域中游地下水模型基于 MODFLOW （MODular three-dimension finite-difference ground⁃ water FLOW model）构建。

↑ 地下水模型的构建model

MODFLOW 采用有限差分法将时间与空间离散化以解决地下水在三维空间中的流动问题。研究区平面面积约为 9016 km2，采用 1 km×1 km 的正方形网格将研究区含水层系统在水平方向上进行空间离散化，离散化之后，研究区在平面上剖分成为 132 行×165 列网格（如图 3）。

研究区内定义为活动单元，研究区外定义为非活动单元。时间上，以每月为一个应力期，每天为一个时间步。研究区边界条件参考文献［41］以及自然边界确定（图 3）。由于地下水分水岭的存在， AE设置为无水流边界；E处为正义峡水文站，黑河从此处流出研究区；据调查资料分析，北山地区地下水含水介质与南部祁连山相似，但由于降水稀少，无常年地表径流，地下水含量无法与祁连山区相比较［26］，因此，将DE边界假设为无水流边界；由…

2.4 模型评价

3 结果与讨论

3.1 算法参数分析（敏感性分析？）

在使用 ESMDA-EnKF 算法对地下水模型进行不确定性分析的过程中，算法存在一些超参数直接影响到同化算法对模型参数的更新效果（例如：算法执行次数（即膨胀系数）Na、观测资料数 n、参数集合大小N等），因此，本文首先为ESMDA-EnKF确定最优超参数，之后采用算法分析模型参数的不确定性以及模型参数对观测数据的敏感性。

↑ 超参数

3.1.1 ESMDA执行次数对不确定性分析的影响

ESMDA 采用 EnKF 算法对地下水模型参数进行更新，因此，EnKF 算法的执行次数将直接影响模型参数的更新效果。然而，目前尚没有关于执行次数的理论研究，而在文献［40］给出的例子中，分别执行了2次、4次算法以对比同化效果。考虑到计算消耗问题，本节评价了执行 1 次、2 次、3 次、4 次 En⁃ KF算法后的同化效果（图4）。

图 4 中展示了 EnKF 算法执行不同次数后的同化效果，Last iteration 代表采用不同的执行次数时，最后一次执行结束之后的同化效果。

由图中可以看出随着 EnKF 算法对模型参数的更新，地下水位的 RMSE 值逐渐降低，表明地下水位模拟值逐渐接近地下水位的观测值。
经过随机采样之后的第一次、第二次参数更新对 RMSE 影响较大；第三次及第四次参数更新之后，与第二次参数更新之后的 RMSE值相比，变化不大。

这表明，使用观测数据对地下水模型参数进行的前 2 次更新最为有效，可能是因为：（1）地下水模型的参数采用随机初始化，因此，ESMDA 在随机模型参数的基础上进行的第一次更新力度相对较大；（2）ESMDA采用了所有时刻的观测数据进行更新，最大程度上利用了观测数据中的信息，因此，能够更加有效且大幅度的更新参数。

↑ QT：有必要进行多次同化吗，观测不是也有不确定性吗，用RMSE？
$$

$$
$$$$$$$$$$$$$$$$$$$$$$
$$

3.1.2 观测资料数对不确定性分析的影响

数据同化的目的即是最大限度的融合不同来源、不同时间、空间分辨率的直接、间接观测数据以用于提高模型的估计精度，以获得更加准确的状态变量的时空分布［44-45］。然而由于成本原因，对系统内相关因素进行无限制观测是不现实的。因此，研究观测资料对同化效果的影响是必要的。选择 42 个水位观测井处的观测水位以及高崖和正义峡水文观测站处的黑河流量观测数据，分别研究观测数为 7、14、21、28、35、42 以及 44 时对同化效果的影响。

图 5 显示了经过 2 次参数更新过程之后，不同观测资料数的 RMSE 值。

由于涉及到不同类型的观测值、输出值（地下水位、流量），所以，图中对地下水水位、流量的观测值及输出值进行了归一化处理，将观测值、输出值限定于［0，1］之间。

由图 5 总体趋势上可以看出，随着观测资料的增加，参数逐渐接近真实值，地下水位的 RMSE 值逐渐降低。然而，观测数从 7个增加至 14个对同化效果影响不明显；观测井数增加至 21 个之后，同化效果基本保持不变；增加流量观测（42 至 44）对同化效果影响较为显著。由此可见，一种类型的观测数据确实会提升同化效果，但是当一种类型的观测数据增加到一定程度时（本实验中为 21 个），观测数据对同化效果的影响有限；然而，继续增加不同类型的观测数据（河流流量观测数据）将进一步提升同化效果。
$$

$$
$$$$$$$$$$$$$$$$$$$$$$
$$

3.1.3 集合大小对不确定性分析的影响

EnKF 采用样本集合的方式表示模型状态变量的先验概率分布，估计误差协方差［46］。

↑ 怎么理解ENKF的先验？（预测步）

Thomas等［47］研究发现，EnKF 中的背景误差协方差估计和滤波函数的最优相关尺度等均与样本集合大小相关；当样本集合比较小时，协方差的估计噪音较大、最优相关尺度较小，出现滤波发散现象，且集合大小直接影响算法运行时间。因此，采用不同的集合大小运行算法，以探索集合大小对算法同化效果的影响。

↑ 下图没显示出相关性，质疑~~

图 6 显示了模型参数集合大小分别为 30、60、 90、120、150 情况下，经过两次参数更新之后的 RMSE 值。由图中可以看出，当参数集合大小为 30 时，并没有出现滤波发散现象，且增加采样对同化效果的影响并不明显。因此，本文将集合大小设置为30以平衡算法效果与计算消耗。

3.2 地下水模型参数不确定性分析

基于以上分析确定了算法参数之后，本文利用 ESMDA-EnKF 对地下水模型参数进行了不确定性分析，模型参数分布见图 7，相关设置见表 1。

从模型参数先验概率分布中进行 30 次随机采样，使用 42 个水位观测井、2 个径流观测站处的观测数据对模型参数进行 2 次（Na=2）更新。模型参数随 ESMDA-EnKF 运行所得的不确定性如图 8（注：由于参数与观测点数较多，图 8 中只显示了部分参数与观测点处的更新情况）。

由图 8 可以看出，无论参数的初始分布如何，每次执行 ESMDA-EnKF 算法都会对模型参数进行更新，使参数值更接近其标定值。

第一次执行算法对模型参数的更新效果显著，大多模型参数在进行第一次更新之后，参数值已经较为接近参数标定值，不确定性明显减小；
在进行第二次更新之后，模型参数均分布于标定值附近，模型参数的不确定性进一步减小。
图 8 第三列显示了各个参数在 ESMDA-EnKF 算法执行完成之后的分布情况，由此分布情况可以看出，各个分区的渗透系数（P1~P8）的不确定性显著减小，基本收敛于最优值附近。
对比各个分区的渗透系数可以看出，经过 ESMDA-EnKF 算法更新之后，第Ⅷ个分区的渗透系数（P8）的分布最分散，不确定性最大，这可能是由于在此分区内观测数据较少，导致算法无法对参数进行有效更新。
在算法执行完成之后，黑河子分区Ⅰ与子分区Ⅱ的河床水力传导系数（P9 与 P10）虽然也收敛到了最优值附近，但是其分布比黑河子分区Ⅲ的河床水力传导系数（P11）分散（即其标准差比较大）。这可能是由于在黑河子分区Ⅰ与子分区Ⅱ河段，黑河与地下水相互作用频繁（子分区Ⅰ河段河流补给地下水，子分区Ⅱ河段地下水出流转化为河流），导致河床水力传导系数不确定性较大。
由图 8可见，ESMDA-EnKF算法能够对模型参数进行有效更新，从而减小模型参数的不确定性。

↑ 同化中，能得到参数分布？

…

QT：顺序同化和非顺序同化、连续性同化和间歇性同化

链接：http://blog.sina.com.cn/s/blog_d8f6ec6b0101t59b.html

↑ very good

一个资料同化循环通常包括四个环节：

（1）质量控制（2）客观分析（3）初始化（4）短期预报产生一个背景场

↑ ？？？

根据数据同化使用的观测资料的时间将其分为两类：

一是顺序同化，就像业务化同化系统运行那样，如客观分析、最优插值、3DVAR，EnKF等，
另外一种是非顺序同化，它利用到未来时刻的观测资料，如4DVAR等。

从气象资料同化方法研究进展和成果来看，代表着顺序同化的三维变分3D-VAR、集合卡尔曼滤波和代表着全局拟合的4D—VAR同化方法，已经成为气象资料同化研究的重点，是气象资料同化今后发展的主要方向。

顺序资料同化方法：综合预报背景场和观测资料而得到分析场，预报模式再以此分析场为初值向前预报至下一个观测资料时刻，如此进行下去，直至同化末端得到模式的最优解，期间观测信息是按顺序方式传递的。
四维变分：求取某一同化时段内与观测资料最接近的模式解，期间观测信息既从过去传递到现在，又从现在传递到过去。

↑ 顺序同化，非顺序同化

根据同化时间的间隔可分为连续性同化和间歇性同化：

连续性同化是在数值模式连续积分的过程中，不断加入与模式时刻一致的观测资料，这样会在短时间积分步长上不断地引入新的观测资料，迫使预报模式对新资料片的冲击进行调整，预报的大气状态也不断调整去拟合观测。
间歇性资料同化是在数值模式积分的一定时间间隔（如3小时、6小时）上引入观测资料。业务形同普遍采用这种间歇同化分析循环系统。以分析时间为中心一定时间间隔内的观测资料被用来作分析，背景场是以3-6小时前的分析场为初始场作出的预报，在分析的基础上再进行初始化，然后用预报模式再产生3-6小时的预报。

↑ 连续性同化和间歇性同化

↑ 同化中极为经典的图~~

参考资料：基于EnKF和ES-MDA的油藏自动历史拟合

与参考资料2中的ES-MDA一样？

参考资料：FY-2G云导风资料同化在台风“天鸽”数值预报中的应用-2022

期刊：《海洋预报》，

这个期刊查不到
作者：许冬梅，等
机构：南京信息工程大学气象灾害教育部重点实验室/气候与环境变化国际合作联合实验室/气象灾害预报预警与评估协同创新中心，江苏南京 210044；
创新点：同化了云导风资料

摘要

以2017年8月登陆我国的13号台风“天鸽”为个例，采用美国全球预报系统资料作为背景场，利用WRF中尺度数值模式及天气研究和预报模式同化系统中的三维变分模块，探究了新一代静止卫星FY-2G云导风资料同化对台风预报的影响。

↑ 主要创新点就是同化了FY-2G云导风资料

研究结果表明：云导风资料同化模拟的台风路径、强度和最大风速与实况更加接近。与控制试验相比，云导风资料同化能够为背景场提供丰富的风矢量信息，增强台风周围对流云及其引导气流的强度，从而较好地模拟台风的内部结构，对影响其发展和维持的水汽条件与动力条件进行改进。

6 结论

本文选取了2017年8月22日登陆我国的强台风 “天鸽”，使用3.9.1版本的WRF模式及其同化系统，考察FY-2G卫星云导风资料同化对台风数值预报的影响，并进行了30 h的确定性预报，同时对影响台风发展移动的多个要素进行考察分析。结论如下：

（1）同化试验模拟的台风路径预报误差更小。由于云导风资料集中在中高层大气，模拟出的台风强度不会因为台风登陆后受到地面的摩擦而产生较大误差。

（2）同化卫星云导风资料能够增加模式初始场中台风周围的风矢量信息，并且为中高层台风周围的大气增加正的风矢量。中高层之间的垂直动量输送使得台风向北移动的引导气流加强，台风路径也随之得到准确调整。

（3）卫星云导风资料同化能够使模式更好地模拟台风结构。同化试验在台风中心附近模拟出了正的相对湿度增量，有利于台风的维持；同时，同化试验模拟的T-logP探空图也更接近观测。

本研究实现了 FY-2G 云导风资料的三维变分同化试验，个例的总体效果表明，云导风资料的同化能够有效改进台风“天鸽”的数值预报效果。

需要指出的是：

本研究是基于一个台风个例展开的，为了充分检验 FY-2G 云导风的效果应增加更多的台风个例进行测试。
在同化方法上，本研究采用的是 3DVAR 方法，在今后的研究中应考虑更高级的同化方法，如混合同化方法，以进一步改进预报效果。
在资料的使用上，本研究仅采用了含有高层风场信息的卫星云导风资料，缺失了低层风场信息，未来将进一步尝试基于云导风资料和微波辐射计资料的多源资料联合同化试验，以进一步提高台风数值预报的准确度。

2 观测资料及同化系统

2.1 FY-2G云导风资料及质量控制方案

…

↑ 介绍FY-2G云导风资料

在同化云导风资料前，需要对资料进行高度订正。本文采用何志新[12] 的高度订正方案。…

↑ 可能这就是这个资料的质量控制方案

2.2 WRFDA同化系统

WRFDA 同化系统与 WRF 模式配套，可同化绝大部分常规和非常规的气象资料，包含 3DVAR、四维变分（Four Dimensional VARiation，4DVAR）以及混合同化等部分。

↑ 为什么WW3没有配套的同化系统

本次试验采用 3DVAR 同化方法，通过代价函数 J 的极小化得到观测场、背景场和分析场之间平衡的最优解。

↑ 3D-VAR 代价函数

↑ B矩阵的设计，去看看老师那篇论文~~

4 试验设计

参考资料：中国海洋资料同化的一些进展-海洋资料同化专题介绍-2022

英文名：Some progress on ocean data assimilation in China: Introduction of the special section “Ocean Data Assimilation” 2022
期刊：《ACTA OCEANOLOGICA SINICA》

2区，https://www.letpub.com.cn/index.php?page=journalapp&view=detail&journalid=137
作者：Huizan Wang，等
机构：College of Meteorology and Oceanography, National University of Defense Technology, Changsha 410073, China
创新点：国内2020同化研究的内容

1 Introduction

This special section is the scientific legacy of the 13th National Ocean Data Assimilation and Numerical Simulation Conference of China, which was held in Changsha, China during December 3–4, 2020 with more than 160 participants from 35 units in China. It continued a series of National Ocean Data Assimilation conferences which began in 2003, which played an important role in the development of ocean data assimilation in China.

In this 13th conference, there were 41 oral presentations and 15 posters, which focused on

ocean data assimilation methods,
ocean reanalysis data,
developments and operational application of ocean data assimilation systems,
assimilation of new type ocean observations,
ocean targeted observation analysis,
coupled data assimilation,
ocean models,
parameter estimation, etc.

↑ 国内2020同化研究的内容

This conference was hosted by National University of Defense Technology, and cohosted by the Institute of Atmospheric Physics of Chinese Academy of Sciences, the South China Sea Institute of Oceanology of Chinese Academy of Sciences, the Second Institute of Oceanography of the Ministry of Natural Resources, Ocean University of China, Tianjin University and Hohai University.

From this conference, we can see that ocean data assimilation has made great progress. In summary, there are six main aspects as follows:

firstly, ocean data assimilation methods are improved continuously, especially new progress in hybrid data assimilation;
secondly, there are continuous breakthroughs in data assimilation application of new type ocean observations, such as Glider data, ocean current data, acoustic information, etc;
thirdly, the combination of ocean data assimilation and machine learning is showing more and more good prospects;
fourthly, with the development of the Earth system model technology, the coupled data assimilation has become a hot spot;
fifthly, ocean reanalysis and data fusion products are important for ocean study, some ocean reanalysis products, such as China’s CORA 2.0 reanalysis, have been greatly improved on the spatial and temporal resolution, and other new data fusion products, including the fusion of multi-source altimeter products, sea ice reanalysis dateset have been developed;
sixthly, some new ocean model parameter estimation have been proposed and show good results.

During the conference, the organizing committee announced Acta Oceanologica Sinica for publishing the research results on Ocean Data Assimilation. There are ten papers accepted for this special section, mainly from the first two aspects above, which are very important.In this special section,

Shen et al. (2022a) used the hybrid data assimilation method called Localized Weighted Ensemble Kalman Filter (LWEnKF) to assimilate along-track sea surface height (AT-SSH), swath sea surface temperature (S-SST) and in-situ temperature and salinity (T/S) profiles for checking the operational application potential of this filter;
Zhao et al. (2022) presented an improved approach based on the equivalent-weights particle filter (EWPF) that uses the proposal density to effectively improve the traditional particle filter, which was tested with the Lorenz 96 model numerical experiments ;
Song et al. (2022) proposed a new nudging scheme for the operational prediction system of the National Marine Environmental Forecasting Center (NMEFC) of China, which mainly aimed at improving El Niño–Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD) predictions;
Yang et al. (2022) designed a reconstruction method called the multi-scale high-order recursive filter (MHRF) to reproduce the refined structure of sea ice field, which is a combination of Van Vliet fourth-order recursive filter and the three-dimensional variational (3D-VAR) analysis;
Chen et al. (2022) designed two comparative reconstruction schemes under the optimal interpolation framework to diagnose and evaluate the contribution from satellite measurements and Argo observations to the reconstructed analysis, allowing for better configuration of assimilation parameters;
Zhang et al. (2022) applied the gradient-dependent optimal interpolation to reconstruct daily subsurface oceanic environmental information according to fishery dates and locations based on Argo temperature and salinity profiles;
Liu et al. (2022) investigated the sensitive areas in targeted observation for predicting the Kuroshio large meander (LM) path using the conditional nonlinear optimal perturbation approach with the Regional Ocean Modeling System (ROMS);
Shen et al. (2022b) developed a two-stage inflation method for parameter estimation, which can address the collapse of parameter ensemble due to the constant evolution of parameters and was applied in observation system simulation experiment with CESM;
Wu et al. (2022) applied empirical orthogonal function (EOF) analysis to a 50-year long time series of monthly mean positions of the Kuroshio path south of Japan from a regional reanalysis to explore temporal-spatial oceanic variation in relation with the three typical Kuroshio paths;
Han et al. (2022) developed two offline bias correction methods for sea surface temperature (SST) forecasts and validated the performances using bias correction experiments implemented in the South China Sea with six-year (2003–2008) datasets.

This special section systematically summarizes some, but not all, of the research results from the 13th National Ocean Data Assimilation and Numerical Simulation Conference of China. In the future,

ocean models will develop to higher resolution, and ocean data assimilation will also face the problem of processing non-linear non-Gaussian information.
With the development of earth system model, ocean model will also be coupled with atmosphere, land surface, ocean waves and sea ice models, the coupled data assimilation will be a hot topic;
and the development of machine learning also prompt us to seek the combination of data assimilation and machine learning.

参考资料：Data Assimilation for Chaotic Dynamics

Data Assimilation for Chaotic Dynamics
https://link.springer.com/chapter/10.1007/978-3-030-77722-7_1
混沌在物理系统中无处不在。对初始条件的相关敏感性是预测天气和其他地球物理流体流动的重要障碍。数据同化是通过模型预测和实时数据的巧妙结合来减少初始条件的不确定性的过程。

实际情况中，可能不仅仅是减少初始条件的不确定性~，还有模型的不确定性~，参数的不确定性~

$$

Jincan

https://Liu-Jincan.github.io/2022/03/02/yan-jiu-sheng-justtry-function/endnote-shu-ju-tong-hua/05-find-idea-of-da/

All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source Jincan !

No tag

04-流量被盗刷

2022-03-03 云翻墙

03-阅读：集合卡尔曼滤波的理论制定与实际实现-2003

2022-03-02 数据同化

05-阅读：Find_idea_of_DA

<center>参考资料：用于3D变分数据同化的基于注意力的卷积自动编码器-2020 </center>

Abstract

Acknowledgements

9. Conclusions and future work

github code

Readme.md

1 Introduction

2 Related work and contribution of the present work

<center>参考资料：基于数据同化的地下水模型不确定性分析 </center>

摘要

结论

0 引言

1 研究区域

2 研究方法

2.1 Ensemble Kalman Filter

2.2 Ensemble Smoother with Multiple Data Assimilation

2.2.1 Ensemble Smoother (ES)

2.2.2 ESMDA

2.2.3 ESMDA-ENKF

2.3 地下水模型

2.4 模型评价

3 结果与讨论

3.1 算法参数分析（敏感性分析？）

3.1.1 ESMDA执行次数对不确定性分析的影响

3.1.2 观测资料数对不确定性分析的影响

3.1.3 集合大小对不确定性分析的影响

3.2 地下水模型参数不确定性分析

QT：顺序同化和非顺序同化、连续性同化和间歇性同化

参考资料：基于EnKF和ES-MDA的油藏自动历史拟合

参考资料：FY-2G云导风资料同化在台风“天鸽”数值预报中的应用-2022

摘要

6 结论

2 观测资料及同化系统

2.1 FY-2G云导风资料及质量控制方案

2.2 WRFDA同化系统

4 试验设计

参考资料：中国海洋资料同化的一些进展-海洋资料同化专题介绍-2022

1 Introduction

参考资料：Data Assimilation for Chaotic Dynamics

Thanks for your reward

`<center>`参考资料：用于3D变分数据同化的基于注意力的卷积自动编码器-2020 `</center>`

`<center>`参考资料：基于数据同化的地下水模型不确定性分析 `</center>`