Save Files

From RogueBasin
Jump to navigation Jump to search

Most roguelike games need a savefile. But building robust and maximally useful savefiles is surprisingly hard. This article is about problems relating to savefiles and saved games and how to design and implement game save and restore in ways that avoid these problems.

Motivation

A good savefile format can make a lot of difference in how people are able to use your game and how you're able to develop it. Being able to load and play savefiles created on different machines, or on different operating systems, or on different architectures with different endianness, makes it possible for cooperation and interoperation of various kinds to happen between people who use all those different architectures and systems. Here are a few examples:

Bug Reports

As a developer, it is great when users can cooperate in the development effort by sending you bug reports accompanied by a savefile. Then you can load the savefile and see what's wrong. But if you're running 64-bit Linux on a dual-cpu system, and the person making a bug report is running something else, and you therefore can't load his savefile on your system, you've lost a very important opportunity for your users to cooperate with you.

Tournaments

In a tournament, someone starts a character, makes a savefile, and distributes the savefile. Then the players have a contest to see who can do the best with that character, and send in the savefiles of their games. This is a great form of community-building, inter- player cooperation. But if the savefile won't load on any platform or operating system except the one where it was created, then a lot of players who want to participate won't be able to.

Alternatively, someone can set up a tournament server where users can log in remotely and play via SSH or equivalent. That way all savefiles are created on the server, so they are compatible with the server. But a lot of users don't like SSH and character interfaces, a lot of games don't support them, and there are laptops that spend a lot of time off the network. Also, there's a certain amount of effort involved; either your game will have a server mode and you'll have to spend the effort getting it right, or it will take a fairly advanced network and shell scripting geek to figure out how to set up a tournament server. So there are some problems.

Shared Systems

On shared systems, there can be dozens of different people playing your game at any moment. When the sysadmin installs the next version of the game, those saved games are lost unless the next version can load and use them. Now suppose people start new games, and then the Sysadmin learns that the new version has a bad bug in it and downgrades to the previous version again. All the new games are lost unless the previous version can read and use the savefiles generated by the new version. This makes the players miserable, and they will in turn make the Sysadmin miserable. You should stop that from happening, if you can. You should cultivate the sheer Karmic goodness of causing more enjoyment than misery. Also, if your game contributes more to the Sysadmin's misery than happiness, they'll just uninstall it.

Users with several different machines

I develop on a 64-bit dual-processor machine. When I go on the road, I do it with a much smaller, much lighter, 32-bit laptop running a different operating system. I should be able to play on my development machine when I'm at home, because it has a bigger screen and a nicer keyboard. But if that savefile won't load on the laptop, I can't take it with me when I go. When it happens, this makes me sad.

Bones Files

In Nethack, there is an option to create what are called Bones files. These are essentially saved games, after the game is lost.

Bones files allow subsequent characters to find the remains (and equipment) of previous characters. For variety, many players participate in an inter-player service called Hearse, which swaps bones files around among users. That way it becomes possible to find the remains and equipment of previous characters played by many other players out on the network, instead of just on the local machine.

But Hearse is limited by the failure of nethack to save its data in a platform neutral way. Bones files created on different versions of nethack, or with different operating systems, or on different machine architectures, cannot be loaded. So, for any particular user, the number of other users who can cooperate in swapping bones files via Hearse is half or less of the number of other users who use Hearse. If you have a minority operating system or are running on a minority architecture, there will be only a very few other players who can swap bones files with you.


Save the Random Seed!

Because different compilers and different platforms have different implementations of random number generation, it is important if you want cross-platform savefiles to save the random state when you make a savefile and restore it when you read it back. You avoid cross-platform incompatibility and many other problems by having your random number generator in your code, under your control, and the same across all platforms.

Pointer Based Data Structures

If you use a lot of dynamic data structures, it's hard, in most languages, to meaningfully save them to a file and restore them. The problem is that when the program is reloaded, it is not very likely that heap allocated objects will be loaded at the same memory addresses. So, when you reload an object A containing a pointer at object B, the pointer won't be pointing at object B any more.

Some languages have libraries that "pickle" the runtime state automatically; but using those libraries is likely to run into some of the other problems listed below. If you decide not to use such a library, or if you're working with a language that doesn't or can't provide one, then you'll have to think pretty hard about how you're storing things and how you can write it to file so that you can read it all back.

Object ID numbers

One way to address the pointer issue is to have each object contain a unique ID number. Whenever you are about to write an object containing pointers, you can make a copy of it replacing all the pointer values with the ID numbers of the objects they point to and write that instead. Then, when you're reading things back, you can index them by their ID number. When you've read everything back, you can go through all your objects and replace ID numbers with real pointers again.

Object Hash Tables

In fact there's no need for the object indexing you do to be temporary. If you keep your objects in hash tables indexed by ID number, you can have every other object in the game refer to them by ID number rather than using a pointer. This simplifies save and restore a lot because the ID numbers retain the same meaning across save and restore. It also simplifies memory management a lot because you can deallocate objects without worrying that there are still pointers to them somewhere. References to "Dangling" pointers can't be checked and will cause your program to crash. Hash lookups of "Dangling" ID numbers will just return a null pointer which you can check for and handle.


Plain Text

There are a lot of advantages for your game if it stores its save files in plain UTF8 text. Ideally a format should be readable (by developers who will have a pretty detailed key at least), and plain text files can be catted, grepped, stream-edited, filtered, pasted into text documents, loaded into an editor, etc. It's also easy to make plain text files work the same across all platforms.

Section Delimiters

Variable-length sections should have unique delimiters that appear between them, identifying both the start and end of each section. Nethack does this another way, by writing the length of each section and then the section. The problem with that is, one corrupted section and/or miscounted length means you've lost the rest of the file. Also, if you write the length and then the section, it means that if you edit a savefile and change the length of a section, you'll have to calculate and insert new lengths for all the delimited regions that contain that section. One way to do this, if you don't want to roll your own, is to use any of a lot of different libraries that save data as XML markup, or as S-expressions.

Corrupt values

When a data structure is read off the disk, the code has to check every value in it for "sanity" and take appropriate action if an insane or impossible value is read. With an accompanying warning to the player and a chance to opt out of loading, insane or impossible values should be replaced with default values.

Version Stability

Insofar as possible, it should be easy to add, change, and delete monster and item types without breaking savefile compatibility. It should be possible to load a savefile from any version on any later version and hopefully on many earlier versions too. There are a bunch of games (and yes, I mean big name games like Nethack and Angband) where savefiles are broken any time the monster or item list gets changed. Obviously, since those games are successful, you don't *have* to fix this, but it's not terribly hard to do, so why not?

Versioned Savefiles

The game should ignore sections of the savefile marked with delimiters it doesn't use, as do XML clients. In some cases where fundamentals change, later versions should save sections in both earlier forms (which they and later versions will silently skip) and later forms (which earlier game versions will silently skip).

Warn the Player!

Some of the techniques for achieving version stability will cause visible changes in the player's game. Even though the player is loading their save file on a new game version, s/he may not want these changes to happen. The machine may belong to someone who has a different game version installed, or the program update may have happened without the user's knowledge. Anyway, a user may want to finish a game in the same version that s/he started it in, so before changing anything on a version conversion, you should warn the player and give them the option to refuse, not load it, and keep their game unchanged. They may prefer take the savefile home and finish the game on their laptop or something.

Version Stable Identifiers

Particular item and monster types should have version-stable identifiers for use in savefiles. One way to create these is by concatenating the name of the type with the version in which it was introduced or most recently changed.

Preimages

New monsters/items introduced after V1.00 should have a savefile form that identifies 'preimage' monsters/items from earlier versions including V1.00.

Simple Conversions

If you have GOBLIN_1_00 as a monster type "goblin" introduced in version 1.00, and you change something about goblins when making version 1.01, (for example reclassifying their AI from "individual" to "pack" in order to change their tactics) then you have GOBLIN_1_01 as a new monster type, also named "goblin", whose preimage is GOBLIN_1_00. In a simple conversion, every GOBLIN_1_00 in the savefile becomes a GOBLIN_1_01 when the game is reloaded.

Diversifying Conversion

If your game has Diversifying Conversion, it means introducing new monsters or items during game restore. For example, if V1.01 of the game also introduces GOBLIN_ARCHER_1_01 "goblin archer" whose preimage is also GOBLIN_1_00, then when the v1.01 game loads a v1.00 savefile it would convert some known percentage of saved goblins into restored goblin archers, based on their respective rarities in V1.01. You may want to have simple conversion between monsters with identical names, without allowing diversifying conversion among monsters with different names.

Version Down Conversions

Down conversions happen when someone loads a savefile on a machine with an earlier version of the game than the one that created the savefile. This is why the preimages have to be part of the saved data, rather than just part of the game data. This is so that when version 1.00 of your game loads a savefile containing GOBLIN_ARCHER_1_01, or GOBLIN_1_01, which it doesn't know about, it knows to replace them both with instances of GOBLIN_1_00.

Eliminating Monsters

In the above example, the monster type GOBLIN_1_00 was completely eliminated from the game. Version 1.01 knows about that identifier because it knows about two monsters which have it as preimages. But what happens if you eliminate something completely and nothing in a subsequent game version has it as a preimage? What does V1.11 do if it reads a LAVA_NYMPH_1_00 and doesn't know of anything that has lava nymphs as a preimage? You have several choices, but they all come down to some form of one rule: the Game still has to know the ID of every creature that's been in earlier versions.

Never Eliminate Anything

One way to handle it is simple; Never Eliminate Anything. Maybe V1.11 will never, ever, under any circumstances, generate a Lava Nymph. But it still knows what lava nymphs are, and how to work them, and will restore them just fine if it reads them in a savefile. This is probably the simplest, but it means your monster and item lists, with all associated code, accumulate forever.

The Game Must Know a 'Postimage' for Deleted Monsters/Items

Another way to handle it is to have a table of deleted monster and object types, each associated with a postimage. Thus, V.11 knows that when it reads a LAVA_NYMPH_1_00, it should just load a FIRE_ELEMENTAL_1_04 instead, even though fire elementals don't have lava nymphs as a preimage. This is probably most practical; the table of names continues to accumulate, but you only have to keep code and data relevant to your current working set.

Un Generatable Replacements

Another way to handle it is to have a designated replacement monster with the eliminated monster as a preimage, even though the replacement monster cannot ever be generated by any means other than loading a savefile with the earlier monster. So you could have a LAVA_NYMPH_NERFED_NO_BEACH_BALL_1_04 whose preimage was LAVA_NYMPH_1_00, and nobody would ever see it unless they loaded a savefile containing a lava nymph in version 1.04 or later. This is the same as a simple conversion, applied to monsters that can't otherwise be generated. This is probably more trouble than it's worth.

Edited Savefiles

Plain Text is Easily Editable

If you have plain text savefiles, so you can read and edit them, then other people will also read and edit them.

You Probably want Savefiles To Be Editable

In fact, as a developer you will probably want to be able to edit savefiles and then load and play them. So you have to decide whether players doing the same is okay with you and whether you need to take steps to try to stop it.

Editing a Savefile Is Cheating

If your users are individuals who don't care about each other's scores, then there is no problem at all. But If you want to support a user community, then highscores achieved by editing savefiles should be considered "cheating" and you should take steps to try and make it hard to create an edited savefile that appears to be unedited. Inter-user services such as Hearse should be able reject the vast majority of games resulting from edited savefiles. Automated scripts on tournament servers should also be able to reject games resulting from edited savefiles.

Record of Moves as a Data Integrity Check

The only way I can think of to do this is saving a record of every move and rerunning it on the tournament server to make sure it reaches the same result that the rest of the saved game details. But there are two problems with that.

Is Version Conversion Cheating?

Firstly, it's going to mark version-converted games as having been edited. That means any shared high-score list is going to ignore games that were finished on a different version than the one on which they were started. That's sad, but probably okay; free version conversion is probably abusable, and it's understandable as part of any tournament rules.

How About Infinite Savescumming?

But second, if that section is readable and editable, it enables people to "rewind time" arbitrarily by playing the move sequence from a savefile into the game again, up to a selected past point. This enables easy, arbitrary savescumming to any point in the game's history. And that means enabling players to cheat almost as arbitrarily as accepting a game resulting from an edited savefile.

A Possible Workaround Involving Encryption

In order to avoid this, it would be necessary to encrypt the section of savefile that stores every move, using an asymmetric cipher. The game has to have access to the encryption key, but the decryption key can remain secret - if, as in a conventional savefile without a move sequence, the game does not ever need to decrypt the move sequence. So someone having a tournament would create a key pair, save the decryption key on the tournament server, and distribute the encryption key with the tournament savefile. When people sent in their savefiles, the tournament server would decrypt the moves section and replay it on the local copy of the game.

You can't really stop cheating. You can only make it hard.

That said? The record of keystrokes entered into the game can be recorded by the operating system, so it isn't really a secret and can't really be protected by encryption. You can't prevent cheating, really; you can only make it harder. You have to make a decision about how much of your time and effort it's worth and whether it's worth anything at all to your users.

Hashed save data

It may be possible to circumvent any save file editing of any kind by using a cryptographic hash function on an entire file and putting the digest in the header of that file. The game would have a hardcoded secret key in the source that would be used to salt the hash, so that only the game would know how to properly save the file. The game would reject a save with an invalid hash, and only the game would know how to include a hash that it would accept later on.

Cheating would not be impossible if, say, the key was leaked somehow, which has happened to the PSN/AACS/etc private keys. Since the key is bundled in the source of the game client, it may be possible to decompile or otherwise find the key in running memory. Of course, cheating would be very, very hard this way, and if someone can circumvent it then they are probably doing more work than playing the game properly would be and might even deserve to be allowed to cheat.

See SHA-256.

Replaying Saved Games

If you're saving a record of each move and your game can handle replaying games for verification anyway, then by all means make saved games watchable on screen. It should be possible to use the savefile to observe/replay a saved game.  For starters, it's a cool screensaver. Second, careful observation of your past actions and their consequences allows another mode of learning from your mistakes. Third, it should allow people to watch and possibly learn from each other's playing styles if they exchange savefiles. Fourth, it allows the developer to watch what actually happened when he gets a bug report.

Players should be able to learn from high-scoring games

Players should be able to retrieve the savefile associated with any entry on the highscores page. The highscores page itself, or "hall of fame", would just be a particular way for the game to display the contents of a directory in which high-scoring savefiles are kept.