Type: New Feature
Affects Version/s: BIGDATA_RELEASE_1_3_1
Fix Version/s: None
Support the automatic re-replication of bad or missing HALogs during startup. The quorum must be met. The log files are replicated from the quorum leader. If the quorum is not met and there are bad or missing HALogs, then the service start will fail.
This is only for log files that are LT the current commit point on the server. HALogs for the current commit point and HALogs for commit points beyond the last commit point recorded on a given service are automatically replicated when the service enters the RESYNC run state.
The main difference between this and RESYNC is that we have a pre-condition that there are no bad or missing HALogs during the HALogNexus startup procedure. Examples of bad HALogs include
- logically empty HALogs (the closing root block has not been applied). This case is already handled when it is the most recent commit point on the journal that has a logically empty HALog.
- physically empty HALogs (nothing written).
- corrupt HALogs (checksum errors).
- missing HALogs (someone deleted them).
Failover relies on the availability of restore points, which relies on HALogs. A service can not start unless it has all historical HALogs up to its last commit point. Once it meets that criterion, it can enter RESYNC and then replicate any HALogs that it needs to catch up with the quorum and join.
See BLZG-859 (HAJournal.start() optimization)
See https://docs.google.com/presentation/d/1IdKQaBouV-a3Bjblk8PtXk-L29Rf_8VjInw7xWKwMDA/edit#slide=id.gaa912276_182 (HA state transitions)