Sometimes it is necessary to go back and re-run the previous cycle to overcome an instability. Some manual steps are required to ensure that the model picks up from the correct point.
Lets call the current failed cycle N, and the previous cycle that we wish to re-run N-1.
In share/data/History_data
, the <suite-id>.xhist
file needs to point to the correct dump.
You need to copy this from the cycle before the one you wish to re-run N-2.
cd work/<cycle>/coupled/history_archive
cp temp_hist.0001 ~/cylc-run/<suite>/share/data/History_Data/<suite>.xhist
Check that the xhist
file now points to the dump for the cycle you wish to start from.
Go to share/data/History_data/NEMOhist
.
Move the latest dump files out the way, e.g. if the timestamp of the latest dump was 20351101
:
mkdir SAVE
mv *20351101* SAVE
This should leave the cycle you wish to start from as the latest NEMO dumps.
Go to share/data/History_data/CICEhist
, and edit ice.restart_file
to point to the appropriate file
The suite can get confused when re-running earlier cycles, especially with the post-processing. Therefore it is important to hold the suite and re-run each task in order before continuing the run.
You will need to re-run the coupled
task in cycle N-1, then cycle N, to check you have got past the crash point.
Then re-run all the post-processing tasks for cycle N-1 to ensure the new data is processed.
Follow the instructions below carefully. If in doubt ask Annette for guidance.
coupled
task for cycle N-1jdma
task for N-1 had completed, you will need to delete the data from JDMA.
It needs to finish uploading before it can be deleted.
If, for example, JDMA is down and the upload hasn’t started you can get JASMIN to cancel the request from the queue.coupled
N-1 has finished, manually trigger coupled
for cycle N.postproc
: This may fail with an error like ValueError: Incorrect size for fixed length header; given 0 words but should be 256.
. Have a look at the a.p4*
files in History_Data
. Is there a zero-length file from the month before (so cycle N-2)? If so delete this file and re-try. Do not delete any other files.compress_netcdf
: This may not show up in the graph. To insert it run: cylc insert SUITE-ID compress_netcdf.CYCLE-POINT
modify_netcdf_metadata
pptransfer
jdma
: Check the old batch has been deleted first. If there is a delay here, you can proceed, but set jdma
to failed first so we keep the task active and it’s clear it needs to be re-run.