Ryan Prior
A few strategies to deal with rollbacks after a server fault:
- Create a hierarchy of event types and prioritize persistence (disk I/O and db writes) according to the hierarchy. Highest priority are the most consequential actions, like crafting results, shattering crystals or breaking objects, items added to player inventories, moved onto a map tile or into a container. Below that would be skill checks resulting in lower HP, mind, stamina. Then at the bottom are mob movements, hunger and thirst ticks, regeneration from resting.
- Send events (if they are consequential enough, at least) to a highly available low-latency persistent datastore like kafka/redpanda or questdb. That way you can reconstruct the game state at least in large part after a fault by opening a window at the newest event the post-fault db knows about and replaying events until you're up to date.
- For low-priority events at the bottom of the consequence hierarchy, still batch writes but instead of saving the whole world every 5 minutes, partition the data by zones and save a partition at a time in a rotating fashion.