|
NAMEzrepl - zrepl Documentation GitHub license Language: Go Twitter Donate via Patreon Donate via GitHub Sponsors Donate via Liberapay Donate via PayPal Matrixzrepl is a one-stop, integrated solution for ZFS replication. GETTING STARTEDThe 10 minute quick-start guides give you a first impression.MAIN FEATURES
ATTENTION: zrepl as well as this documentation is still under active
development. There is no stability guarantee on the RPC protocol or
configuration format, but we do our best to document breaking changes in the
changelog.
CONTRIBUTINGWe are happy about any help we can get!
TABLE OF CONTENTSQuick Start by Use CaseThe goal of this quick-start guide is to give you an impression of how zrepl can accomodate your use case.Install zreplFollow the OS-specific installation instructions and come back here.Overview Of How zrepl WorksCheck out the overview section to get a rough idea of what you are going to configure in the next step, then come back here.Configuration Exampleszrepl is configured through a YAML configuration file in /etc/zrepl/zrepl.yml. We have prepared example use cases that show-case typical deployments and different functionality of zrepl. We encourage you to read through all of the examples to get an idea of what zrepl has to offer, and how you can mix-and-match configurations for your use case. Keep the full config documentation handy if a config snippet is unclear.Example Use Cases Continuous Backup of a ServerThis config example shows how we can backup our ZFS-based server to another machine using a zrepl push job.
Our backup solution should fulfill the following requirements:
AnalysisWe can model this situation as two jobs:
Generate TLS CertificatesWe use the TLS client authentication transport to protect our data on the wire. To get things going quickly, we skip setting up a CA and generate two self-signed certificates as described here. For convenience, we generate the key pairs on our local machine and distribute them using ssh:(name=backups; openssl req -x509 -sha256 -nodes \ -newkey rsa:4096 \ -days 365 \ -keyout $name.key \ -out $name.crt -addext "subjectAltName = DNS:$name" -subj "/CN=$name") (name=prod; openssl req -x509 -sha256 -nodes \ -newkey rsa:4096 \ -days 365 \ -keyout $name.key \ -out $name.crt -addext "subjectAltName = DNS:$name" -subj "/CN=$name") ssh root@backups "mkdir /etc/zrepl" scp backups.key backups.crt prod.crt root@backups:/etc/zrepl ssh root@prod "mkdir /etc/zrepl" scp prod.key prod.crt backups.crt root@prod:/etc/zrepl Note that alternative transports exist, e.g. via TCP without TLS or ssh. Configure server prodWe define a push job named prod_to_backups in /etc/zrepl/zrepl.yml on host prod :jobs: - name: prod_to_backups type: push connect: type: tls address: "backups.example.com:8888" ca: /etc/zrepl/backups.crt cert: /etc/zrepl/prod.crt key: /etc/zrepl/prod.key server_cn: "backups" filesystems: { "zroot<": true, "zroot/var/tmp<": false, "zroot/usr/home/paranoid": false } snapshotting: type: periodic prefix: zrepl_ interval: 10m pruning: keep_sender: - type: not_replicated - type: last_n count: 10 keep_receiver: - type: grid grid: 1x1h(keep=all) | 24x1h | 30x1d | 6x30d regex: "^zrepl_" Configure server backupsWe define a corresponding sink job named sink in /etc/zrepl/zrepl.yml on host backups :jobs: - name: sink type: sink serve: type: tls listen: ":8888" ca: "/etc/zrepl/prod.crt" cert: "/etc/zrepl/backups.crt" key: "/etc/zrepl/backups.key" client_cns: - "prod" root_fs: "storage/zrepl/sink" Go Back To Quickstart GuideClick here to go back to the quickstart guide.Local Snapshots + Offline Backup to an External DiskThis config example shows how we can use zrepl to make periodic snapshots of our local workstation and back it up to a zpool on an external disk which we occassionally connect.The local snapshots should be taken every 15 minutes for pain-free recovery from CLI disasters (rm -rf / and the like). However, we do not want to keep the snapshots around for very long because our workstation is a little tight on disk space. Thus, we only keep one hour worth of high-resolution snapshots, then fade them out to one per hour for a day (24 hours), then one per day for 14 days. At the end of each work day, we connect our external disk that serves as our workstation's local offline backup. We want zrepl to inspect the filesystems and snapshots on the external pool, figure out which snapshots were created since the last time we connected the external disk, and use incremental replication to efficiently mirror our workstation to our backup disk. Afterwards, we want to clean up old snapshots on the backup pool: we want to keep all snapshots younger than one hour, 24 for each hour of the first day, then 360 daily backups. A few additional requirements:
The following config snippet implements the setup described above. You will likely want to customize some aspects mentioned in the top comment in the file. # This config serves as an example for a local zrepl installation that # backups the entire zpool `system` to `backuppool/zrepl/sink` # # The requirements covered by this setup are described in the zrepl documentation's # quick start section which inlines this example. # # CUSTOMIZATIONS YOU WILL LIKELY WANT TO APPLY: # - adjust the name of the production pool `system` in the `filesystems` filter of jobs `snapjob` and `push_to_drive` # - adjust the name of the backup pool `backuppool` in the `backuppool_sink` job # - adjust the occurences of `myhostname` to the name of the system you are backing up (cannot be easily changed once you start replicating) # - make sure the `zrepl_` prefix is not being used by any other zfs tools you might have installed (it likely isn't) jobs: # this job takes care of snapshot creation + pruning - name: snapjob type: snap filesystems: { "system<": true, } # create snapshots with prefix `zrepl_` every 15 minutes snapshotting: type: periodic interval: 15m prefix: zrepl_ pruning: keep: # fade-out scheme for snapshots starting with `zrepl_` # - keep all created in the last hour # - then destroy snapshots such that we keep 24 each 1 hour apart # - then destroy snapshots such that we keep 14 each 1 day apart # - then destroy all older snapshots - type: grid grid: 1x1h(keep=all) | 24x1h | 14x1d regex: "^zrepl_.*" # keep all snapshots that don't have the `zrepl_` prefix - type: regex negate: true regex: "^zrepl_.*" # This job pushes to the local sink defined in job `backuppool_sink`. # We trigger replication manually from the command line / udev rules using # `zrepl signal wakeup push_to_drive` - type: push name: push_to_drive connect: type: local listener_name: backuppool_sink client_identity: myhostname filesystems: { "system<": true } send: encrypted: true replication: protection: initial: guarantee_resumability # Downgrade protection to guarantee_incremental which uses zfs bookmarks instead of zfs holds. # Thus, when we yank out the backup drive during replication # - we might not be able to resume the interrupted replication step because the partially received `to` snapshot of a `from`->`to` step may be pruned any time # - but in exchange we get back the disk space allocated by `to` when we prune it # - and because we still have the bookmarks created by `guarantee_incremental`, we can still do incremental replication of `from`->`to2` in the future incremental: guarantee_incremental snapshotting: type: manual pruning: # no-op prune rule on sender (keep all snapshots), job `snapshot` takes care of this keep_sender: - type: regex regex: ".*" # retain keep_receiver: # longer retention on the backup drive, we have more space there - type: grid grid: 1x1h(keep=all) | 24x1h | 360x1d regex: "^zrepl_.*" # retain all non-zrepl snapshots on the backup drive - type: regex negate: true regex: "^zrepl_.*" # This job receives from job `push_to_drive` into `backuppool/zrepl/sink/myhostname` - type: sink name: backuppool_sink root_fs: "backuppool/zrepl/sink" serve: type: local listener_name: backuppool_sink Click here to go back to the quickstart guide. Fan-out replicationThis quick-start example demonstrates how to implement a fan-out replication setup where datasets on a server (A) are replicated to multiple targets (B, C, etc.).This example uses multiple source jobs on server A and pull jobs on the target servers. WARNING: Before implementing this setup, please see the caveats
listed in the fan-out replication configuration overview.
OverviewOn the source server (A), there should be:
On each target server, there should be:
Generate TLS CertificatesMutual TLS via the TLS client authentication transport can be used to secure the connections between the servers. In this example, a self-signed certificate is created for each server without setting up a CA.source=a.example.com targets=( b.example.com c.example.com # ... ) for server in "${source}" "${targets[@]}"; do openssl req -x509 -sha256 -nodes \ -newkey rsa:4096 \ -days 365 \ -keyout "${server}.key" \ -out "${server}.crt" \ -addext "subjectAltName = DNS:${server}" \ -subj "/CN=${server}" done # Distribute each host's keypair for server in "${source}" "${targets[@]}"; do ssh root@"${server}" mkdir /etc/zrepl scp "${server}".{crt,key} root@"${server}":/etc/zrepl/ done # Distribute target certificates to the source scp "${targets[@]/%/.crt}" root@"${source}":/etc/zrepl/ # Distribute source certificate to the targets for server in "${targets[@]}"; do scp "${source}.crt" root@"${server}":/etc/zrepl/ done Configure source server Ajobs: # Separate job for snapshots and pruning - name: snapshots type: snap filesystems: 'tank<': true # all filesystems snapshotting: type: periodic prefix: zrepl_ interval: 10m pruning: keep: # Keep non-zrepl snapshots - type: regex negate: true regex: '^zrepl_' # Time-based snapshot retention - type: grid grid: 1x1h(keep=all) | 24x1h | 30x1d | 12x30d regex: '^zrepl_' # Source job for target B - name: target_b type: source serve: type: tls listen: :8888 ca: /etc/zrepl/b.example.com.crt cert: /etc/zrepl/a.example.com.crt key: /etc/zrepl/a.example.com.key client_cns: - b.example.com filesystems: 'tank<': true # all filesystems # Snapshots are handled by the separate snap job snapshotting: type: manual # Source job for target C - name: target_c type: source serve: type: tls listen: :8889 ca: /etc/zrepl/c.example.com.crt cert: /etc/zrepl/a.example.com.crt key: /etc/zrepl/a.example.com.key client_cns: - c.example.com filesystems: 'tank<': true # all filesystems # Snapshots are handled by the separate snap job snapshotting: type: manual # Source jobs for remaining targets. Each one should listen on a different port # and reference the correct certificate and client CN. # - name: target_c # ... Configure each target serverjobs: # Pull from source server A - name: source_a type: pull connect: type: tls # Use the correct port for this specific client (eg. B is 8888, C is 8889, etc.) address: a.example.com:8888 ca: /etc/zrepl/a.example.com.crt # Use the correct key pair for this specific client cert: /etc/zrepl/b.example.com.crt key: /etc/zrepl/b.example.com.key server_cn: a.example.com root_fs: pool0/backup interval: 10m pruning: keep_sender: # Source does the pruning in its snap job - type: regex regex: '.*' # Receiver-side pruning can be configured as desired on each target server keep_receiver: # Keep non-zrepl snapshots - type: regex negate: true regex: '^zrepl_' # Time-based snapshot retention - type: grid grid: 1x1h(keep=all) | 24x1h | 30x1d | 12x30d regex: '^zrepl_' Go Back To Quickstart GuideClick here to go back to the quickstart guide.Use zrepl configcheck to validate your configuration. No output indicates that everything is fine. NOTE: Please open an issue on GitHub if your use case for zrepl
is significantly different from those listed above. Or even better, write it
up in the same style as above and open a PR!
Apply Configuration ChangesWe hope that you have found a configuration that fits your use case. Use zrepl configcheck once again to make sure the config is correct (output indicates that everything is fine). Then restart the zrepl daemon on all systems involved in the replication, likely using service zrepl restart or systemctl restart zrepl.WARNING: Please read up carefully on the pruning rules before
applying the config. In particular, note that most example configs apply to
all snapshots, not just zrepl-created snapshots. Use the following keep rule
on sender and receiver to prevent this:
- type: regex negate: true regex: "^zrepl_.*" # <- the 'prefix' specified in snapshotting.prefix Watch it WorkRun zrepl status on the active side of the replication setup to monitor snaphotting, replication and pruning activity. To re-trigger replication (snapshots are separate!), use zrepl signal wakeup JOBNAME. (refer to the example use case document if you are uncertain which job you want to wake up).You can also use basic UNIX tools to inspect see what's going on. If you like tmux, here is a handy script that works on FreeBSD: pkg install gnu-watch tmux tmux new -s zrepl -d tmux split-window -t zrepl "tail -f /var/log/messages" tmux split-window -t zrepl "gnu-watch 'zfs list -t snapshot -o name,creation -s creation'" tmux split-window -t zrepl "zrepl status" tmux select-layout -t zrepl tiled tmux attach -t zrepl The Linux equivalent might look like this: # make sure tmux is installed & let's assume you use systemd + journald tmux new -s zrepl -d tmux split-window -t zrepl "journalctl -f -u zrepl.service" tmux split-window -t zrepl "watch 'zfs list -t snapshot -o name,creation -s creation'" tmux split-window -t zrepl "zrepl status" tmux select-layout -t zrepl tiled tmux attach -t zrepl What Next?
InstallationTIP:Note: check out the quick-start guides if you want a
first impression of zrepl.
User PrivilegesIt is possible to run zrepl as an unprivileged user in combination with ZFS delegation. Also, there is the possibility to run it in a jail on FreeBSD by delegating a dataset to the jail.TIP: Note: check out the installation-freebsd-jail-with-iocage
for FreeBSD jail setup instructions.
Packageszrepl source releases are signed & tagged by the author in the git repository. Your OS vendor may provide binary packages of zrepl through the package manager. Additionally, binary releases are provided on GitHub. The following list may be incomplete, feel free to submit a PR with an update:
Debian / Ubuntu APT repositoriesWe maintain APT repositories for Debian, Ubuntu and derivatives. The fingerprint of the signing key is E101 418F D3D6 FBCB 9D65 A62D 7086 99FC 5F2E BF16. It is available at https://zrepl.cschwarz.com/apt/apt-key.asc . Please open an issue in on GitHub if you encounter any issues with the repository.( set -ex zrepl_apt_key_url=https://zrepl.cschwarz.com/apt/apt-key.asc zrepl_apt_key_dst=/usr/share/keyrings/zrepl.gpg zrepl_apt_repo_file=/etc/apt/sources.list.d/zrepl.list # Install dependencies for subsequent commands sudo apt update && sudo apt install curl gnupg lsb-release # Deploy the zrepl apt key. curl -fsSL "$zrepl_apt_key_url" | tee | gpg --dearmor | sudo tee "$zrepl_apt_key_dst" > /dev/null # Add the zrepl apt repo. ARCH="$(dpkg --print-architecture)" CODENAME="$(lsb_release -i -s | tr '[:upper:]' '[:lower:]') $(lsb_release -c -s | tr '[:upper:]' '[:lower:]')" echo "Using Distro and Codename: $CODENAME" echo "deb [arch=$ARCH signed-by=$zrepl_apt_key_dst] https://zrepl.cschwarz.com/apt/$CODENAME main" | sudo tee /etc/apt/sources.list.d/zrepl.list # Update apt repos. sudo apt update ) NOTE: Until zrepl reaches 1.0, the repositories will be updated
to the latest zrepl release immediately. This includes breaking changes
between zrepl versions. Use apt-mark hold zrepl to prevent upgrades of
zrepl.
RPM repositoriesWe provide a single RPM repository for all RPM-based Linux distros. The zrepl binary in the repo is the same as the one published to GitHub. Since Go binaries are statically linked, the RPM should work about everywhere.The fingerprint of the signing key is F6F6 E8EA 6F2F 1462 2878 B5DE 50E3 4417 826E 2CE6. It is available at https://zrepl.cschwarz.com/rpm/rpm-key.asc . Please open an issue on GitHub if you encounter any issues with the repository. Copy-paste the following snippet into your shell to set up the zrepl repository. Then dnf install zrepl and make sure to confirm that the signing key matches the one shown above. cat > /etc/yum.repos.d/zrepl.repo <<EOF [zrepl] name = zrepl baseurl = https://zrepl.cschwarz.com/rpm/repo gpgkey = https://zrepl.cschwarz.com/rpm/rpm-key.asc EOF NOTE: Until zrepl reaches 1.0, the repository will be updated
to the latest zrepl release immediately. This includes breaking changes
between zrepl versions. If that bothers you, use the dnf versionlock
plugin to pin the version of zrepl on your system.
Compile From SourceProducing a release requires Go 1.11 or newer and Python 3 + pip3 + docs/requirements.txt for the Sphinx documentation. A tutorial to install Go is available over at golang.org. Python and pip3 should probably be installed via your distro's package manager.
The Python venv is used for the documentation build dependencies. If you just want to build the zrepl binary, leave it out and use ./lazy.sh godep instead. Alternatively, you can use the Docker build process: it is used to produce the official zrepl binary releases and serves as a reference for build dependencies and procedure: cd to/your/zrepl/checkout # make sure your user has access to the docker socket make release-docker # if you want .deb or .rpm packages, invoke the follwoing # targets _after_ you invoked release-docker make deb-docker make rpm-docker # build artifacts are available in ./artifacts/release # packages are available in ./artifacts NOTE: It is your job to install the built binary in the zrepl
users's $PATH, e.g. /usr/local/bin/zrepl. Otherwise, the
examples in the quick-start guides may need to be adjusted.
FreeBSD Jail With iocageThis tutorial shows how zrepl can be installed on FreeBSD, or FreeNAS in a jail using iocage. While this tutorial focuses on using iocage, much of the setup would be similar using a different jail manager.NOTE: From a security perspective, just keep in mind that
zfs send/recv was never designed with jails in mind, an attacker
could probably crash the receive-side kernel or worse induce stateful damage
to the receive-side pool if they were able to get access to the jail.
The jail doesn't provide security benefits, but only management ones. RequirementsA dataset that will be delegated to the jail needs to be created if one does not already exist. For the tutorial tank/zrepl will be used.zfs create -o mountpoint=none tank/zrepl The only software requirements on the host system are iocage, which can be installed from ports or packages. pkg install py37-iocage NOTE: By default iocage will "activate" on
first use which will set up some defaults such as which pool will be used. To
activate iocage manually the iocage activate command can be
used.
Jail CreationThere are two options for jail creation using FreeBSD.
Manual JailCreate a jail, using the same release as the host, called zrepl that will be automatically started at boot. The jail will have tank/zrepl delegated into it.iocage create --release "$(freebsd-version -k | cut -d '-' -f '1,2')" --name zrepl \ boot=on nat=1 \ jail_zfs=on \ jail_zfs_dataset=zrepl \ jail_zfs_mountpoint='none' Enter the jail: iocage console zrepl Install zrepl pkg update && pkg upgrade pkg install zrepl Create the log file /var/log/zrepl.log touch /var/log/zrepl.log && service newsyslog restart Tell syslogd to redirect facility local0 to the zrepl.log file: service syslogd reload Enable the zrepl daemon to start automatically at boot: sysrc zrepl_enable="YES" Now jump to the summary below. PluginWhen using the plugin, zrepl will be installed for you in a jail using the following iocage properties.
Additionally the delegated dataset should be specified upon creation, and optionally start on boot can be set. This can also be done from the FreeNAS webui. fetch https://raw.githubusercontent.com/ix-plugin-hub/iocage-plugin-index/master/zrepl.json -o /tmp/zrepl.json iocage fetch -P /tmp/zrepl.json --name zrepl jail_zfs_dataset=zrepl boot=on ConfigurationNow zrepl can be configured.Enter the jail. iocage console zrepl Modify the /usr/local/etc/zrepl/zrepl.yml configuration file. TIP: Note: check out the quick-start guides for examples of a
sink job.
Now zrepl can be started. service zrepl start Now jump to the summary below. SummaryCongratulations, you have a working jail!NOTE: With FreeBSD 13's transition to OpenZFS 2.0, please
ensure that your jail's FreeBSD version matches the one in the kernel module.
If you are getting cryptic errors such as cannot receive new filesystem
stream: invalid backup stream the instructions posted here might
help.
What next?Read the configuration chapter and then continue with the usage chapter.Reminder: If you want a quick introduction, please read the quick-start guides. ConfigurationOverview & TerminologyAll work zrepl does is performed by the zrepl daemon which is configured in a single YAML configuration file loaded on startup. The following paths are considered:
The zrepl configcheck subcommand can be used to validate the configuration. The command will output nothing and exit with zero status code if the configuration is valid. The error messages vary in quality and usefulness: please report confusing config errors to the tracking issue #155. Full example configs such as in the quick-start guides or the config/samples/ directory might also be helpful. However, copy-pasting examples is no substitute for reading documentation! Config File Structureglobal: ... jobs: - name: backup type: push - ... zrepl is configured using a single YAML configuration file with two main sections: global and jobs. The global section is filled with sensible defaults and is covered later in this chapter. The jobs section is a list of jobs which we are going to explain now. Jobs & How They Work TogetherA job is the unit of activity tracked by the zrepl daemon. The type of a job determines its role in a replication setup and in snapshot management. Jobs are identified by their name, both in log files and the zrepl status command.NOTE: The job name is persisted in several places on disk and
thus cannot be changed easily.
Replication always happens between a pair of jobs: one is the active side, and one the passive side. The active side connects to the passive side using a transport and starts executing the replication logic. The passive side responds to requests from the active side after checking its permissions. The following table shows how different job types can be combined to achieve both push and pull mode setups. Note that snapshot-creation denoted by "(snap)" is orthogonal to whether a job is active or passive.
How the Active Side WorksThe active side (push and pull job) executes the replication and pruning logic:
TIP: The progress of the active side can be watched live using
the zrepl status subcommand.
How the Passive Side WorksThe passive side (sink and source) waits for connections from the corresponding active side, using the transport listener type specified in the serve field of the job configuration. When a client connects, the transport listener performS listener-specific access control (cert validation, IP ACLs, etc) and determines the client identity. The passive side job then uses this client identity as follows:
TIP: The implementation of the sink job requires that
the connecting client identities be a valid ZFS filesystem name
components.
How Replication WorksOne of the major design goals of the replication module is to avoid any duplication of the nontrivial logic. As such, the code works on abstract senders and receiver endpoints, where typically one will be implemented by a local program object and the other is an RPC client instance. Regardless of push- or pull-style setup, the logic executes on the active side, i.e. in the push or pull job.The following high-level steps take place during replication and can be monitored using the zrepl status subcommand:
The idea behind the execution order of replication steps is that if the sender snapshots all filesystems simultaneously at fixed intervals, the receiver will have all filesystems snapshotted at time T1 before the first snapshot at T2 = T1 + $interval is replicated. ZFS Background KnowledgeThis section gives some background knowledge about ZFS features that zrepl uses to provide guarantees for a replication filesystem. Specifically, zrepl guarantees by default that incremental replication is always possible and that started replication steps can always be resumed if they are interrupted.ZFS Send Modes & Bookmarks ZFS supports full sends (zfs send fs@to) and incremental sends (zfs send -i @from fs@to). Full sends are used to create a new filesystem on the receiver with the send-side state of fs@to. Incremental sends only transfer the delta between @from and @to. Incremental sends require that @from be present on the receiving side when receiving the incremental stream. Incremental sends can also use a ZFS bookmark as from on the sending side (zfs send -i #bm_from fs@to), where #bm_from was created using zfs bookmark fs@from fs#bm_from. The receiving side must always have the actual snapshot @from, regardless of whether the sending side uses @from or a bookmark of it. Plain and raw sends By default, zfs send sends the most generic, backwards-compatible data stream format (so-called 'plain send'). If the sent uses newer features, e.g. compression or encryption, zfs send has to un-do these operations on the fly to produce the plain send stream. If the receiver uses newer features (e.g. compression or encryption inherited from the parent FS), it applies the necessary transformations again on the fly during zfs recv. Flags such as -e, -c and -L tell ZFS to produce a send stream that is closer to how the data is stored on disk. Sending with those flags removes computational overhead from sender and receiver. However, the receiver will not apply certain transformations, e.g., it will not compress with the receive-side compression algorithm. The -w (--raw) flag produces a send stream that is as raw as possible. For unencrypted datasets, its current effect is the same as -Lce. Encrypted datasets can only be sent plain (unencrypted) or raw (encrypted) using the -w flag. Resumable Send & Recv The -s flag for zfs recv tells zfs to save the partially received send stream in case it is interrupted. To resume the replication, the receiving side filesystem's receive_resume_token must be passed to a new zfs send -t <value> | zfs recv command. A full send can only be resumed if @to still exists. An incremental send can only be resumed if @to still exists and either @from still exists or a bookmark #fbm of @from still exists. ZFS Holds ZFS holds prevent a snapshot from being deleted through zfs destroy, letting the destroy fail with a datset is busy error. Holds are created and referred to by a tag. They can be thought of as a named, persistent lock on the snapshot. ZFS Abstractions Managed By zreplWith the background knowledge from the previous paragraph, we now summarize the different on-disk ZFS objects that zrepl manages to provide its functionality.Placeholder filesystems on the receiving side are regular ZFS filesystems with the ZFS property zrepl:placeholder=on. Placeholders allow the receiving side to mirror the sender's ZFS dataset hierarchy without replicating every filesystem at every intermediary dataset path component. Consider the following example: S/H/J shall be replicated to R/sink/job/S/H/J, but neither S/H nor S shall be replicated. ZFS requires the existence of R/sink/job/S and R/sink/job/S/H in order to receive into R/sink/job/S/H/J. Thus, zrepl creates the parent filesystems as placeholders on the receiving side. If at some point S/H and S shall be replicated, the receiving side invalidates the placeholder flag automatically. The zrepl test placeholder command can be used to check whether a filesystem is a placeholder. The replication cursor bookmark and last-received-hold are managed by zrepl to ensure that future replications can always be done incrementally. The replication cursor is a send-side bookmark of the most recent successfully replicated snapshot, and the last-received-hold is a hold of that snapshot on the receiving side. Both are moved atomically after the receiving side has confirmed that a replication step is complete. The replication cursor has the format #zrepl_CUSOR_G_<GUID>_J_<JOBNAME>. The last-received-hold tag has the format zrepl_last_received_J_<JOBNAME>. Encoding the job name in the names ensures that multiple sending jobs can replicate the same filesystem to different receivers without interference. Tentative replication cursor bookmarks are short-lived bookmarks that protect the atomic moving-forward of the replication cursor and last-received-hold (see this issue). They are only necessary if step holds are not used as per the replication.protection setting. The tentative replication cursor has the format #zrepl_CUSORTENTATIVE_G_<GUID>_J_<JOBNAME>. The zrepl zfs-abstraction list command provides a listing of all bookmarks and holds managed by zrepl. Step holds are zfs holds managed by zrepl to ensure that a replication step can always be resumed if it is interrupted, e.g., due to network outage. zrepl creates step holds before it attempts a replication step and releases them after the receiver confirms that the replication step is complete. For an initial replication full @initial_snap, zrepl puts a zfs hold on @initial_snap. For an incremental send @from -> @to, zrepl puts a zfs hold on both @from and @to. Note that @from is not strictly necessary for resumability -- a bookmark on the sending side would be sufficient --, but size-estimation in currently used OpenZFS versions only works if @from is a snapshot. The hold tag has the format zrepl_STEP_J_<JOBNAME>. A job only ever has one active send per filesystem. Thus, there are never more than two step holds for a given pair of (job,filesystem). Step bookmarks are zrepl's equivalent for holds on bookmarks (ZFS does not support putting holds on bookmarks). They are intended for a situation where a replication step uses a bookmark #bm as incremental from where #bm is not managed by zrepl. To ensure resumability, zrepl copies #bm to step bookmark #zrepl_STEP_G_<GUID>_J_<JOBNAME>. If the replication is interrupted and #bm is deleted by the user, the step bookmark remains as an incremental source for the resumable send. Note that zrepl does not yet support creating step bookmarks because the corresponding ZFS feature for copying bookmarks is not yet widely available . Subscribe to zrepl issue #326 for details. The zrepl zfs-abstraction list command provides a listing of all bookmarks and holds managed by zrepl. NOTE: More details can be found in the design document
replication/design.md.
LimitationsATTENTION:Currently, zrepl does not replicate filesystem
properties. When receiving a filesystem, it is never mounted (-u flag)
and mountpoint=none is set. This is temporary and being worked on
issue #24.
Multiple Jobs & More than 2 MachinesMost users are served well with a single sender and a single receiver job. This section documents considerations for more complex setups.ATTENTION: Before you continue, make sure you have a working
understanding of how zrepl works and what zrepl does to ensure
that replication between sender and receiver is always possible without
conflicts. This will help you understand why certain kinds of multi-machine
setups do not (yet) work.
NOTE: If you can't find your desired configuration, have
questions or would like to see improvements to multi-job setups, please
open an issue on GitHub.
Multiple Jobs on one MachineAs a general rule, multiple jobs configured on one machine must operate on disjoint sets of filesystems. Otherwise, concurrently running jobs might interfere when operating on the same filesystem.On your setup, ensure that
Exceptions to the rule:
More Than 2 MachinesThis section might be relevant to users who wish to fan-in (N machines replicate to 1) or fan-out (replicate 1 machine to N machines).Working setups:
Setups that do not work:
Job Types in DetailJob Type push
Example config: config/samples/push.yml Job Type sink
Example config: config/samples/sink.yml Job Type pull
Example config: config/samples/pull.yml Job Type source
Example config: config/samples/source.yml Local replicationIf you have the need for local replication (most likely between two local storage pools), you can use the local transport type to connect a local push job to a local sink job.Example config: config/samples/local.yml. Job Type snap (snapshot & prune only)Job type that only takes snapshots and performs pruning on the local machine.
Example config: config/samples/snap.yml TransportsThe zrepl RPC layer uses transports to establish a single, bidirectional data stream between an active and passive job. On the passive (serving) side, the transport also provides the client identity to the upper layers: this string is used for access control and separation of filesystem sub-trees in sink jobs. Transports are specified in the connect or serve section of a job definition.Contents
ATTENTION: The client identities must be valid ZFS dataset path
components because the sink job uses ${root_fs}/${client_identity}
to determine the client's subtree.
tcp TransportThe tcp transport uses plain TCP, which means that the data is not encrypted on the wire. Clients are identified by their IPv4 or IPv6 addresses, and the client identity is established through a mapping on the server.This transport may also be used in conjunction with network-layer encryption and/or VPN tunnels to provide encryption on the wire. To make the IP-based client authentication effective, such solutions should provide authenticated IP addresses. Some options to consider:
Servejobs: - type: sink serve: type: tcp listen: ":8888" listen_freebind: true # optional, default false clients: { "192.168.122.123" : "mysql01", "192.168.122.42" : "mx01", "2001:0db8:85a3::8a2e:0370:7334": "gateway", # CIDR masks require a '*' in the client identity string # that is expanded to the client's IP address "10.23.42.0/24": "cluster-*" "fde4:8dba:82e1::/64": "san-*" } ... listen_freebind controls whether the socket is allowed to bind to non-local or unconfigured IP addresses (Linux IP_FREEBIND , FreeBSD IP_BINDANY). Enable this option if you want to listen on a specific IP address that might not yet be configured when the zrepl daemon starts. Connectjobs: - type: push connect: type: tcp address: "10.23.42.23:8888" dial_timeout: # optional, default 10s ... tls TransportThe tls transport uses TCP + TLS with client authentication using client certificates. The client identity is the common name (CN) presented in the client certificate.It is recommended to set up a dedicated CA infrastructure for this transport, e.g. using OpenVPN's EasyRSA. For a simple 2-machine setup, mutual TLS might also be sufficient. We provide copy-pastable instructions to generate the certificates below. The implementation uses Go's TLS library. Since Go binaries are statically linked, you or your distribution need to recompile zrepl when vulnerabilities in that library are disclosed. All file paths are resolved relative to the zrepl daemon's working directory. Specify absolute paths if you are unsure what directory that is (or find out from your init system). If intermediate CAs are used, the full chain must be present in either in the ca file or the individual cert files. Regardless, the client's certificate must be first in the cert file, with each following certificate directly certifying the one preceding it (see TLS's specification). This is the common default when using a CA management tool. NOTE: As of Go 1.15 (zrepl 0.3.0 and newer), the Go TLS / x509
library requrires Subject Alternative Names be present in certificates.
You might need to re-generate your certificates using one of the two
alternatives provided below.
Note further that zrepl continues to use the CommonName field to assign client identities. Hence, we recommend to keep the Subject Alternative Name and the CommonName in sync. Servejobs: - type: sink root_fs: "pool2/backup_laptops" serve: type: tls listen: ":8888" listen_freebind: true # optional, default false ca: /etc/zrepl/ca.crt cert: /etc/zrepl/prod.fullchain key: /etc/zrepl/prod.key client_cns: - "laptop1" - "homeserver" The ca field specified the certificate authority used to validate client certificates. The client_cns list specifies a list of accepted client common names (which are also the client identities for this transport). The listen_freebind field is explained here. Connectjobs: - type: pull connect: type: tls address: "server1.foo.bar:8888" ca: /etc/zrepl/ca.crt cert: /etc/zrepl/backupserver.fullchain key: /etc/zrepl/backupserver.key server_cn: "server1" dial_timeout: # optional, default 10s The ca field specifies the CA which signed the server's certificate (serve.cert). The server_cn specifies the expected common name (CN) of the server's certificate. It overrides the hostname specified in address. The connection fails if either do not match. Mutual-TLS between Two MachinesHowever, for a two-machine setup, self-signed certificates distributed using an out-of-band mechanism will also work just fine:Suppose you have a push-mode setup, with backups.example.com running the sink job, and prod.example.com running the push job. Run the following OpenSSL commands on each host, substituting HOSTNAME in both filenames and the interactive input prompt by OpenSSL: (name=HOSTNAME; openssl req -x509 -sha256 -nodes \ -newkey rsa:4096 \ -days 365 \ -keyout $name.key \ -out $name.crt -addext "subjectAltName = DNS:$name" -subj "/CN=$name") Now copy each machine's HOSTNAME.crt to the other machine's /etc/zrepl/HOSTNAME.crt, for example using scp. The serve & connect configuration will thus look like the following: # on backups.example.com - type: sink serve: type: tls listen: ":8888" ca: "/etc/zrepl/prod.example.com.crt" cert: "/etc/zrepl/backups.example.com.crt" key: "/etc/zrepl/backups.example.com.key" client_cns: - "prod.example.com" ... # on prod.example.com - type: push connect: type: tls address:"backups.example.com:8888" ca: /etc/zrepl/backups.example.com.crt cert: /etc/zrepl/prod.example.com.crt key: /etc/zrepl/prod.example.com.key server_cn: "backups.example.com" ... Certificate Authority using EasyRSAFor more than two machines, it might make sense to set up a CA infrastructure. Tools like EasyRSA make this very easy:#!/usr/bin/env bash set -euo pipefail HOSTS=(backupserver prod1 prod2 prod3) curl -L https://github.com/OpenVPN/easy-rsa/releases/download/v3.0.7/EasyRSA-3.0.7.tgz > EasyRSA-3.0.7.tgz echo "157d2e8c115c3ad070c1b2641a4c9191e06a32a8e50971847a718251eeb510a8 EasyRSA-3.0.7.tgz" | sha256sum -c rm -rf EasyRSA-3.0.7 tar -xf EasyRSA-3.0.7.tgz cd EasyRSA-3.0.7 ./easyrsa ./easyrsa init-pki ./easyrsa build-ca nopass for host in "${HOSTS[@]}"; do ./easyrsa build-serverClient-full $host nopass echo cert for host $host available at pki/issued/$host.crt echo key for host $host available at pki/private/$host.key done echo ca cert available at pki/ca.crt ssh+stdinserver Transportssh+stdinserver uses the ssh command and some features of the server-side SSH authorized_keys file. It is less efficient than other transports because the data passes through two more pipes. However, it is fairly convenient to set up and allows the zrepl daemon to not be directly exposed to the internet, because all traffic passes through the system's SSH server.The concept is inspired by git shell and Borg Backup. The implementation is provided by the Go package github.com/problame/go-netssh. NOTE: ssh+stdinserver generally provides inferior error
detection and handling compared to the tcp and tls transports.
When encountering such problems, consider using tcp or tls
transports, or help improve package go-netssh.
Servejobs: - type: source serve: type: stdinserver client_identities: - "client1" - "client2" ... First of all, note that type=stdinserver in this case: Currently, only connect.type=ssh+stdinserver can connect to a serve.type=stdinserver, but we want to keep that option open for future extensions. The serving job opens a UNIX socket named after client_identity in the runtime directory. In our example above, that is /var/run/zrepl/stdinserver/client1 and /var/run/zrepl/stdinserver/client2. On the same machine, the zrepl stdinserver $client_identity command connects to /var/run/zrepl/stdinserver/$client_identity. It then passes its stdin and stdout file descriptors to the zrepl daemon via cmsg(3). zrepl daemon in turn combines them into an object implementing net.Conn: a Write() turns into a write to stdout, a Read() turns into a read from stdin. Interactive use of the stdinserver subcommand does not make much sense. However, we can force its execution when a user with a particular SSH pubkey connects via SSH. This can be achieved with an entry in the authorized_keys file of the serving zrepl daemon. # for OpenSSH >= 7.2 command="zrepl stdinserver CLIENT_IDENTITY",restrict CLIENT_SSH_KEY # for older OpenSSH versions command="zrepl stdinserver CLIENT_IDENTITY",no-port-forwarding,no-X11-forwarding,no-pty,no-agent-forwarding,no-user-rc CLIENT_SSH_KEY
NOTE: You may need to adjust the PermitRootLogin option
in /etc/ssh/sshd_config to forced-commands-only or higher for
this to work. Refer to sshd_config(5) for details.
To recap, this is of how client authentication works with the ssh+stdinserver transport:
Connectjobs: - type: pull connect: type: ssh+stdinserver host: prod.example.com user: root port: 22 identity_file: /etc/zrepl/ssh/identity # options: # optional, default [], `-o` arguments passed to ssh # - "Compression=yes" # dial_timeout: 10s # optional, default 10s, max time.Duration until initial handshake is completed The connecting zrepl daemon
As discussed in the section above, the connecting zrepl daemon expects that zrepl stdinserver $client_identity is executed automatically via an authorized_keys file entry. The known_hosts file used by the ssh command must contain an entry for connect.host prior to starting zrepl. Thus, run the following on the pulling host's command line (substituting connect.host): ssh -i /etc/zrepl/ssh/identity root@prod.example.com NOTE: The environment variables of the underlying SSH process
are cleared. $SSH_AUTH_SOCK will not be available. It is suggested to
create a separate, unencrypted SSH key solely for that purpose.
local TransportThe local transport can be used to implement local replication, i.e., push replication between a push and sink job defined in the same configuration file.The listener_name is analogous to a hostname and must match between serve and connect. The client_identity is used by the sink as documented above. jobs: - type: sink serve: type: local listener_name: localsink ... - type: push connect: type: local listener_name: localsink client_identity: local_backup dial_timeout: 2s # optional, 0 for no timeout ... Filter SyntaxFor source, push and snap jobs, a filesystem filter must be defined (field filesystems). A filter takes a filesystem path (in the ZFS filesystem hierarchy) as parameter and returns true (pass) or false (block).A filter is specified as a YAML dictionary with patterns as keys and booleans as values. The following rules determine which result is chosen for a given filesystem path:
The subtree wildcard < means "the dataset left of < and all its children". TIP: You can try out patterns for a configured job using the
zrepl test filesystems subcommand for push and source jobs.
ExamplesFull AccessThe following configuration will allow access to all filesystems.jobs: - type: source filesystems: { "<": true, } ... Fine-grainedThe following configuration demonstrates all rules presented above.jobs: - type: source filesystems: { "tank<": true, # rule 1 "tank/foo<": false, # rule 2 "tank/foo/bar": true, # rule 3 } ... Which rule applies to given path, and what is the result? tank/foo/bar/loo => 2 false tank/bar => 1 true tank/foo/bar => 3 true zroot => NONE false tank/var/log => 1 true Send & Recv OptionsSend OptionsSource and push jobs have an optional send configuration section.jobs: - type: push filesystems: ... send: # flags from the table below go here ... The following table specifies the list of (boolean) options. Flags with an entry in the zfs send column map directly to the zfs send CLI flags. zrepl does not perform feature checks for these flags. If you enable a flag that is not supported by the installed version of ZFS, the zfs error will show up at runtime in the logs and zrepl status. See the upstream man page (man zfs-send) for their semantics.
encryptedThe encrypted option controls whether the matched filesystems are sent as OpenZFS native encryption raw sends. More specifically, if encrypted=true, zrepl
Filesystems matched by filesystems that are not encrypted are not sent and will cause error log messages. If encrypted=false, zrepl expects that filesystems matching filesystems are not encrypted or have loaded encryption keys. NOTE: Use encrypted instead of raw to make your
intent clear that zrepl must only replicate filesystems that are actually
encrypted by OpenZFS native encryption. It is meant as a safeguard to prevent
unintended sends of unencrypted filesystems in raw mode.
propertiesSends the dataset properties along with snapshots. Please be careful with this option and read the note on property replication below.backup_propertiesWhen properties are modified on a filesystem that was received from a send stream with send.properties=true, ZFS archives the original received value internally. This also applies to inheriting or overriding properties during zfs receive.When sending those received filesystems another hop, the backup_properties flag instructs ZFS to send the original property values rather than the current locally set values. This is useful for replicating properties across multiple levels of backup machines. Example: Suppose we want to flow snapshots from Machine A to B, then from B to C. A will enable the properties send option. B will want to override critical properties such as mountpoint or canmount. But the job that replicates from B to C should be sending the original property values received from A. Thus, B sets the backup_properties option. Please be careful with this option and read the note on property replication below. large_blocksThis flag should not be changed after initial replication. Prior to OpenZFS commit 7bcb7f08 it was possible to change this setting which resulted in data loss on the receiver. The commit in question is included in OpenZFS 2.0 and works around the problem by prohibiting receives of incremental streams with a flipped setting.WARNING: This bug has not been fixed in the OpenZFS 0.8
releases which means that changing this flag after initial replication
might cause data loss on the receiver.
Recv OptionsSink and pull jobs have an optional recv configuration section:jobs: - type: pull recv: properties: inherit: - "mountpoint" override: { "org.openzfs.systemd:ignore": "on" } bandwidth_limit: ... placeholder: encryption: unspecified | off | inherit ... Jump to properties , bandwidth_limit , and placeholder. propertiesoverride maps directly to the zfs recv -o flag. Property name-value pairs specified in this map will apply to all received filesystems, regardless of whether the send stream contains properties or not.inherit maps directly to the zfs recv -x flag. Property names specified in this list will be inherited from the receiving side's parent filesystem (e.g. root_fs). With both options, the sending side's property value is still stored on the receiver, but the local override or inherit is the one that takes effect. You can send the original properties from the first receiver to another receiver using send.backup_properties. A Note on Property ReplicationIf a send stream contains properties, as per send.properties or send.backup_properties, the default ZFS behavior is to use those properties on the receiving side, verbatim.In many use cases for zrepl, this can have devastating consequences. For example, when backing up a filesystem that has mountpoint=/ to a storage server, that storage server's root filesystem will be shadowed by the received file system on some platforms. Also, many scripts and tools use ZFS user properties for configuration and do not check the property source (local vs. received). If they are installed on the receiving side as well as the sending side, property replication could have unintended effects. zrepl currently does not provide any automatic safe-guards for property replication:
Below is an non-exhaustive list of problematic properties. Please open a pull request if you find a property that is missing from this list. (Both with regards to core ZFS tools and other software in the broader ecosystem.) Mount behaviour
Note: inheriting or overriding the mountpoint property on ZVOLs fails in zfs recv. This is an issue in OpenZFS . As a workaround, consider creating separate zrepl jobs for your ZVOL and filesystem datasets. Please comment at zrepl issue #430 if you encounter this issue and/or would like zrepl to automatically work around it. SystemdWith systemd, you should also consider the properties processed by the zfs-mount-generator .Most notably:
EncryptionIf the sender filesystems are encrypted but the sender does plain sends and property replication is enabled, the receiver must inherit the following properties:
Placeholdersplaceholder: encryption: unspecified | off | inherit During replication, zrepl creates placeholder datasets on the receiving side if the sending side's filesystems filter creates gaps in the dataset hierarchy. This is generally fully transparent to the user. However, with OpenZFS Native Encryption, placeholders require zrepl user attention. Specifically, the problem is that, when zrepl attempts to create the placeholder dataset on the receiver, and that placeholder's parent dataset is encrypted, ZFS wants to inherit encryption to the placeholder. This is relevant to two use cases that zrepl supports:
For encrypted-send-to-untrusted-receiver, the placeholder datasets need to be created with -o encryption=off. Without it, creation would fail with an error, indicating that the placeholder's parent dataset's key needs to be loaded. But we don't trust the receiver, so we can't expect that to ever happen. However, for send-plain-encrypt-on-receive, we cannot set -o encryption=off. The reason is that if we did, any of the (non-placeholder) child datasets below the placeholder would inherit encryption=off, thereby silently breaking our encrypt-on-receive use case. So, to cover this use case, we need to create placeholders without specifying -o encryption. This will make zfs create inherit the encryption mode from the parent dataset, and thereby transitively from root_fs. The zrepl config provides the recv.placeholder.encryption knob to control this behavior. In undefined mode (default), placeholder creation bails out and asks the user to configure a behavior. In off mode, the placeholder is created with encryption=off, i.e., encrypted-send-to-untrusted-rceiver use case. In inherit mode, the placeholder is created without specifying -o encryption at all, i.e., the send-plain-encrypt-on-receive use case. Common OptionsBandwidth Limit (send & recv)bandwidth_limit: max: 23.5 MiB # -1 is the default and disabled rate limiting bucket_capacity: # token bucket capacity in bytes; defaults to 128KiB Both send and recv can be limited to a maximum bandwidth through bandwidth_limit. For most users, it should be sufficient to just set bandwidth_limit.max. The bandwidth_limit.bucket_capacity refers to the token bucket size. The bandwidth limit only applies to the payload data, i.e., the ZFS send stream. It does not account for transport protocol overheads. The scope is the job level, i.e., all concurrent sends or incoming receives of a job share the bandwidth limit. Replication Optionsjobs: - type: push filesystems: ... replication: protection: initial: guarantee_resumability # guarantee_{resumability,incremental,nothing} incremental: guarantee_resumability # guarantee_{resumability,incremental,nothing} concurrency: size_estimates: 4 steps: 1 ... protection optionThe protection variable controls the degree to which a replicated filesystem is protected from getting out of sync through a zrepl pruner or external tools that destroy snapshots. zrepl can guarantee resumability or just incremental replication.guarantee_resumability is the default value and guarantees that a replication step is always resumable and that incremental replication will always be possible. The implementation uses replication cursors, last-received-hold and step holds. guarantee_incremental only guarantees that incremental replication will always be possible. If a step from -> to is interrupted and its to snapshot is destroyed, zrepl will remove the half-received to's resume state and start a new step from -> to2. The implementation uses replication cursors, tentative replication cursors and last-received-hold. guarantee_nothing does not make any guarantees with regards to keeping sending and receiving side in sync. No bookmarks or holds are created to protect sender and receiver from diverging. Tradeoffs Using guarantee_incremental instead of guarantee_resumability obviously removes the resumability guarantee. This means that replication progress is no longer monotonic which might lead to a replication setup that never makes progress if mid-step interruptions are too frequent (e.g. frequent network outages). However, the advantage and reason for existence of the incremental mode is that it allows the pruner to delete snapshots of interrupted replication steps which is useful if replication happens so rarely (or fails so frequently) that the amount of disk space exclusively referenced by the step's snapshots becomes intolerable. NOTE: When changing this flag, obsoleted zrepl-managed
bookmarks and holds will be destroyed on the next replication step that is
attempted for each filesystem.
concurrency optionThe concurrency options control the maximum amount of concurrency during replication. The default values allow some concurrency during size estimation but no parallelism for the actual replication.
Note that initial replication cannot start replicating child filesystems before the parent filesystem's initial replication step has completed. Some notes on tuning these values:
Taking SnaphotsThe push, source and snap jobs can automatically take periodic snapshots of the filesystems matched by the filesystems filter field. The snapshot names are composed of a user-defined prefix followed by a UTC date formatted like 20060102_150405_000. We use UTC because it will avoid name conflicts when switching time zones or between summer and winter time.When a job is started, the snapshotter attempts to get the snapshotting rhythms of the matched filesystems in sync because snapshotting all filesystems at the same time results in a more consistent backup. To find that sync point, the most recent snapshot, made by the snapshotter, in any of the matched filesystems is used. A filesystem that does not have snapshots by the snapshotter has lower priority than filesystem that do, and thus might not be snapshotted (and replicated) until it is snapshotted at the next sync point. For push jobs, replication is automatically triggered after all filesystems have been snapshotted. Note that the zrepl signal wakeup JOB subcommand does not trigger snapshotting. jobs: - type: push filesystems: { "<": true, "tmp": false } snapshotting: type: periodic prefix: zrepl_ interval: 10m hooks: ... ... There is also a manual snapshotting type, which covers the following use cases:
Note that you will have to trigger replication manually using the zrepl signal wakeup JOB subcommand in that case. jobs: - type: push filesystems: { "<": true, "tmp": false } snapshotting: type: manual ... Pre- and Post-Snapshot HooksJobs with periodic snapshots can run hooks before and/or after taking the snapshot specified in snapshotting.hooks: Hooks are called per filesystem before and after the snapshot is taken (pre- and post-edge). Pre-edge invocations are in configuration order, post-edge invocations in reverse order, i.e. like a stack. If a pre-snapshot invocation fails, err_is_fatal=true cuts off subsequent hooks, does not take a snapshot, and only invokes post-edges corresponding to previous successful pre-edges. err_is_fatal=false logs the failed pre-edge invocation but does not affect subsequent hooks nor snapshotting itself. Post-edges are only invoked for hooks whose pre-edges ran without error. Note that hook failures for one filesystem never affect other filesystems.The optional timeout parameter specifies a period after which zrepl will kill the hook process and report an error. The default is 30 seconds and may be specified in any units understood by time.ParseDuration. The optional filesystems filter which limits the filesystems the hook runs for. This uses the same filter specification as jobs. Most hook types take additional parameters, please refer to the respective subsections below.
command Hooksjobs: - type: push filesystems: { "<": true, "tmp": false } snapshotting: type: periodic prefix: zrepl_ interval: 10m hooks: - type: command path: /etc/zrepl/hooks/zrepl-notify.sh timeout: 30s err_is_fatal: false - type: command path: /etc/zrepl/hooks/special-snapshot.sh filesystems: { "tank/special": true } ... command hooks take a path to an executable script or binary to be executed before and after the snapshot. path must be absolute (e.g. /etc/zrepl/hooks/zrepl-notify.sh). No arguments may be specified; create a wrapper script if zrepl must call an executable that requires arguments. The process standard output is logged at level INFO. Standard error is logged at level WARN. The following environment variables are set:
An empty template hook can be found in config/samples/hooks/template.sh. postgres-checkpoint HookConnects to a Postgres server and executes the CHECKPOINT statement pre-snapshot. Checkpointing applies the WAL contents to all data files and syncs the data files to disk. This is not required for a consistent database backup: it merely forward-pays the "cost" of WAL replay to the time of snapshotting instead of at restore. However, the Postgres manual recommends against checkpointing during normal operation. Further, the operation requires Postgres superuser privileges. zrepl users must decide on their own whether this hook is useful for them (it likely isn't).ATTENTION: Note that WALs and Postgres data directory (with all
database data files) must be on the same filesystem to guarantee a correct
point-in-time backup with the ZFS snapshot.
DSN syntax documented here: https://godoc.org/github.com/lib/pq CREATE USER zrepl_checkpoint PASSWORD yourpasswordhere; ALTER ROLE zrepl_checkpoint SUPERUSER; - type: postgres-checkpoint dsn: "host=localhost port=5432 user=postgres password=yourpasswordhere sslmode=disable" filesystems: { "p1/postgres/data11": true } mysql-lock-tables HookConnects to MySQL and executes
Above procedure is documented in the MySQL manual as a means to produce a consistent backup of a MySQL DBMS installation (i.e., all databases). DSN syntax: [username[:password]@][protocol[(address)]]/dbname[?param1=value1&...¶mN=valueN] ATTENTION: All MySQL databases must be on the same ZFS filesystem to
guarantee a consistent point-in-time backup with the ZFS snapshot.
CREATE USER zrepl_lock_tables IDENTIFIED BY 'yourpasswordhere'; GRANT RELOAD ON *.* TO zrepl_lock_tables; FLUSH PRIVILEGES; - type: mysql-lock-tables dsn: "zrepl_lock_tables:yourpasswordhere@tcp(localhost)/" filesystems: { "tank/mysql": true } Pruning PoliciesIn zrepl, pruning means destroying snapshots. Pruning must happen on both sides of a replication or the systems would inevitably run out of disk space at some point.Typically, the requirements to temporal resolution and maximum retention time differ per side. For example, when using zrepl to back up a busy database server, you will want high temporal resolution (snapshots every 10 min) for the last 24h in case of administrative disasters, but cannot afford to store them for much longer because you might have high turnover volume in the database. On the receiving side, you may have more disk space available, or need to comply with other backup retention policies. zrepl uses a set of keep rules per sending and receiving side to determine which snapshots shall be kept per filesystem. A snapshot that is not kept by any rule is destroyed. The keep rules are evaluated on the active side (push or pull job) of the replication setup, for both active and passive side, after replication completed or was determined to have failed permanently. Example Configuration: jobs: - type: push name: ... connect: ... filesystems: { "<": true, "tmp": false } snapshotting: type: periodic prefix: zrepl_ interval: 10m pruning: keep_sender: - type: not_replicated # make sure manually created snapshots by the administrator are kept - type: regex regex: "^manual_.*" - type: grid grid: 1x1h(keep=all) | 24x1h | 14x1d regex: "^zrepl_.*" keep_receiver: - type: grid grid: 1x1h(keep=all) | 24x1h | 35x1d | 6x30d regex: "^zrepl_.*" # manually created snapshots will be kept forever on receiver - type: regex regex: "^manual_.*" DANGER: You might have existing snapshots of filesystems
affected by pruning which you want to keep, i.e. not be destroyed by zrepl.
Make sure to actually add the necessary regex keep rules on both sides,
like with manual in the example above.
Policy not_replicatedjobs: - type: push pruning: keep_sender: - type: not_replicated ... not_replicated keeps all snapshots that have not been replicated to the receiving side. It only makes sense to specify this rule for the keep_sender. The reason is that, by definition, all snapshots on the receiver have already been replicated to there from the sender. To determine whether a sender-side snapshot has already been replicated, zrepl uses the replication cursor bookmark which corresponds to the most recent successfully replicated snapshot. Policy gridjobs: - type: pull pruning: keep_receiver: - type: grid regex: "^zrepl_.*" grid: 1x1h(keep=all) | 24x1h | 35x1d | 6x30d │ │ │ └─ 1 repetition of a one-hour interval with keep=all │ │ └─ 24 repetitions of a one-hour interval with keep=1 │ └─ 6 repetitions of a 30-day interval with keep=1 ... The retention grid can be thought of as a time-based sieve that thins out snapshots as they get older. The grid field specifies a list of adjacent time intervals. Each interval is a bucket with a maximum capacity of keep snapshots. The following procedure happens during pruning:
The syntax to describe the bucket list is as follows: Repeat x Duration (keep=all)
Example: Assume the following grid specification: grid: 1x1h(keep=all) | 2x2h | 1x3h This grid specification produces the following constellation of buckets: 0h 1h 2h 3h 4h 5h 6h 7h 8h 9h | | | | | | | | | | |-Bucket1-|-----Bucket2-------|------Bucket3------|-----------Bucket4-----------| | keep=all| keep=1 | keep=1 | keep=1 | Now assume that we have a set of snapshots @a, @b, ..., @D. Snapshot @a is the most recent snapshot. Snapshot @D is the oldest snapshot, it is almost 9 hours older than snapshot @a. We place the snapshots on the same timeline as the buckets: 0h 1h 2h 3h 4h 5h 6h 7h 8h 9h | | | | | | | | | | |-Bucket1-|-----Bucket2-------|------Bucket3------|-----------Bucket4-----------| | keep=all| keep=1 | keep=1 | keep=1 | | | | | | | a b c | d e f g h i j k l m n o p |q r s t u v w x y z |A B C D We obtain the following mapping of snapshots to buckets: Bucket1: a,b,c Bucket2: d,e,f,g,h,i Bucket3: j,k,l,m,n,o,p Bucket4: q,r,s,t,u,v,w,x,y,z No bucket: A,B,C,D For each bucket, we now prune snapshots until it only contains `keep` snapshots. Newer snapshots are destroyed first. Snapshots that do not fall into a bucket are always destroyed. Result after pruning: 0h 1h 2h 3h 4h 5h 6h 7h 8h 9h | | | | | | | | | | |-Bucket1-|-----Bucket2-------|------Bucket3------|-----------Bucket4-----------| | | | | | | a b c | i | p | z | Policy last_njobs: - type: push pruning: keep_receiver: - type: last_n count: 10 regex: ^zrepl_.*$ # optional ... last_n filters the snapshot list by regex, then keeps the last count snapshots in that list (last = youngest = most recent creation date) All snapshots that don't match regex or exceed count in the filtered list are destroyed unless matched by other rules. Policy regexjobs: - type: push pruning: keep_receiver: # keep all snapshots with prefix zrepl_ or manual_ - type: regex regex: "^(zrepl|manual)_.*" - type: push snapshotting: prefix: zrepl_ pruning: keep_sender: # keep all snapshots that were not created by zrepl - type: regex negate: true regex: "^zrepl_.*" regex keeps all snapshots whose names are matched by the regular expression in regex. Like all other regular expression fields in prune policies, zrepl uses Go's regexp.Regexp Perl-compatible regular expressions (Syntax). The optional negate boolean field inverts the semantics: Use it if you want to keep all snapshots that do not match the given regex. Source-side snapshot pruningA source jobs takes snapshots on the system it runs on. The corresponding pull job on the replication target connects to the source job and replicates the snapshots. Afterwards, the pull job coordinates pruning on both sender (the source job side) and receiver (the pull job side).There is no built-in way to define and execute pruning on the source side independently of the pull side. The source job will continue taking snapshots which will not be pruned until the pull side connects. This means that extended replication downtime will fill up the source's zpool with snapshots. If the above is a conceivable situation for you, consider using push mode, where pruning happens on the same side where snapshots are taken. Workaround using snap jobAs a workaround (see GitHub issue #102 for development progress), a pruning-only snap job can be defined on the source side: The snap job is in charge of snapshot creation & destruction, whereas the source job's role is reduced to just serving snapshots. However, since, jobs are run independently, it is possible that the snap job will prune snapshots that are queued for replication / destruction by the remote pull job that connects to the source job. Symptoms of such race conditions are spurious replication and destroy errors.Example configuration: # source side jobs: - type: snap snapshotting: type: periodic pruning: keep: # source side pruning rules go here ... - type: source snapshotting: type: manual root_fs: ... # pull side jobs: - type: pull pruning: keep_sender: # let the source-side snap job do the pruning - type: regex regex: ".*" ... keep_receiver: # feel free to prune on the pull side as desired ... Loggingzrepl uses structured logging to provide users with easily processable log messages.Logging outlets are configured in the global section of the config file. global: logging: - type: OUTLET_TYPE level: MINIMUM_LEVEL format: FORMAT - type: OUTLET_TYPE level: MINIMUM_LEVEL format: FORMAT ... jobs: ... ATTENTION: The first outlet is special: if an error writing
to any outlet occurs, the first outlet receives the error and can print it.
Thus, the first outlet must be the one that always works and does not block,
e.g. stdout, which is the default.
Default ConfigurationBy default, the following logging configuration is usedglobal: logging: - type: "stdout" level: "warn" format: "human" Building BlocksThe following sections document the semantics of the different log levels, formats and outlet types.Levels
Incorrectly classified messages are considered a bug and should be reported. Formats
OutletsOutlets are the destination for log entries.stdout Outlet
Writes all log entries with minimum level level formatted by format to stdout. If stdout is a tty, interactive usage is assumed and both time and color are set to true. Can only be specified once. syslog Outlet
Writes all log entries formatted by format to syslog. On normal setups, you should not need to change the retry_interval. Can only be specified once. tcp Outlet
Establishes a TCP connection to address and sends log messages with minimum level level formatted by format. If tls is not specified, an unencrypted connection is established. If tls is specified, the TCP connection is secured with TLS + Client Authentication. The latter is particularly useful in combination with log aggregation services.
WARNING: zrepl drops log messages to the TCP outlet if the
underlying connection is not fast enough. Note that TCP buffering in the
kernel must first run full before messages are dropped.
Make sure to always configure a stdout outlet as the special error outlet to be informed about problems with the TCP outlet (see above ). NOTE: zrepl uses Go's crypto/tls and crypto/x509
packages and leaves all but the required fields in tls.Config at their
default values. In case of a security defect in these packages, zrepl has to
be rebuilt because Go binaries are statically linked.
MonitoringMonitoring endpoints are configured in the global.monitoring section of the config file.Prometheus & Grafanazrepl can expose Prometheus metrics via HTTP. The listen attribute is a net.Listen string for tcp, e.g. :9811 or 127.0.0.1:9811 (port 9811 was reserved to zrepl on the official list). The listen_freebind attribute is explained here. The Prometheus monitoring job appears in the zrepl control job list and may be specified at most once.zrepl also ships with an importable Grafana dashboard that consumes the Prometheus metrics: see dist/grafana. The dashboard also contains some advice on which metrics are important to monitor. NOTE: At the time of writing, there is no stability guarantee
on the exported metrics.
global: monitoring: - type: prometheus listen: ':9811' listen_freebind: true # optional, default false MiscellaneousRuntime Directories & UNIX SocketsThe zrepl daemon needs to open various UNIX sockets in a runtime directory:
There is no authentication on these sockets except the UNIX permissions. The zrepl daemon will refuse to bind any of the above sockets in a directory that is world-accessible. The following sections of the global config shows the default paths. The shell script below shows how the default runtime directory can be created. global: control: sockpath: /var/run/zrepl/control serve: stdinserver: sockdir: /var/run/zrepl/stdinserver mkdir -p /var/run/zrepl/stdinserver chmod -R 0700 /var/run/zrepl Durations & IntervalsInterval & duration fields in job definitions, pruning configurations, etc. must match the following regex:var durationStringRegex *regexp.Regexp = regexp.MustCompile(`^\s*(\d+)\s*(s|m|h|d|w)\s*$`) // s = second, m = minute, h = hour, d = day, w = week (7 days) Super-Verbose Job DebuggingYou have probably landed here because you opened an issue on GitHub and some developer told you to do this... So just read the annotated comments ;)job: - name: ... ... # JOB DEBUGGING OPTIONS # should be equal for all job types, but each job implements the debugging itself debug: conn: # debug the io.ReadWriteCloser connection read_dump: /tmp/connlog_read # dump results of Read() invocations to this file write_dump: /tmp/connlog_write # dump results of Write() invocations to this file rpc: # debug the RPC protocol implementation log: true # log output from rpc layer to the job log ATTENTION: Connection dumps will almost certainly contain your or
other's private data. Do not share it in a bug report.
UsageCLI OverviewNOTE:The zrepl binary is self-documenting: run zrepl
help for an overview of the available subcommands or zrepl SUBCOMMAND
--help for information on available flags, etc.
zrepl daemonAll actual work zrepl does is performed by a daemon process. The daemon supports structured logging and provides monitoring endpoints.When installing from a package, the package maintainer should have provided an init script / systemd.service file. You should thus be able to start zrepl daemon using your init system. Alternatively, or for running zrepl in the foreground, simply execute zrepl daemon. Note that you won't see much output with the default logging configuration: ATTENTION: Make sure to actually monitor the error level output of
zrepl: some configuration errors will not make the daemon exit.
Example: if the daemon cannot create the transport-ssh+stdinserver sockets in the runtime directory, it will emit an error message but not exit because other tasks such as periodic snapshots & pruning are of equal importance. RestartingThe daemon handles SIGINT and SIGTERM for graceful shutdown. Graceful shutdown means at worst that a job will not be rescheduled for the next interval. The daemon exits as soon as all jobs have reported shut down.Systemd Unit FileA systemd service definition template is available in dist/systemd. Note that some of the options only work on recent versions of systemd. Any help & improvements are very welcome, see issue #145.Ops RunbooksMigrating Sending SideObjective: Move sending-side zpool to new hardware. Make the move fully transparent to the sending-side jobs. After the move is done, all sending-side zrepl jobs should continue to work as if the move had not happened. In particular, incremental replication should be able to pick up where it left before the move.Suppose we want to migrate all data from one zpool oldpool to another zpool newpool. A possible reason might be that we want to change RAID levels, ashift, or just migrate over to next-gen hardware. If the pool names are different, zrepl's matching between sender and receiver dataset will break becase the receive-side dataset names contain oldpool. To avoid this, we will need the name of the new pool to match that of the old pool. The following steps will accomplish this:
Note that, depending on pruning rules, it will not be possible to switch back to the old pool seamlessly, i.e., without a full re-replication. Platform TestsAlong with the main zrepl binary, we release the platformtest binaries. The zrepl platform tests are an integration test suite that is complementary to the pure Go unit tests. Any test that needs to interact with ZFS is a platform test.The platform need to run as root. For each test, we create a fresh dummy zpool backed by a file-based vdev. The file path, and a root mountpoint for the dummy zpool, must be specified on the command line: mkdir -p /tmp/zreplplatformtest ./platformtest \ -poolname 'zreplplatformtest' \ # <- name must contain zreplplatformtest -imagepath /tmp/zreplplatformtest.img \ # <- zrepl will create the file -mountpoint /tmp/zreplplatformtest # <- must exist WARNING: platformtest will unconditionally overwrite the
file at imagepath and unconditionally zpool destroy $poolname.
So, don't use a production poolname, and consider running the test in a VM.
It'll be a lot faster as well because the underlying operations, zfs
list in particular, will be faster.
While the platformtests are running, there will be a log of log output. After all tests have run, it prints a summary with a list of tests, grouped by result type (success, failure, skipped): PASSING TESTS: github.com/zrepl/zrepl/platformtest/tests.BatchDestroy github.com/zrepl/zrepl/platformtest/tests.CreateReplicationCursor github.com/zrepl/zrepl/platformtest/tests.GetNonexistent github.com/zrepl/zrepl/platformtest/tests.HoldsWork ... github.com/zrepl/zrepl/platformtest/tests.SendStreamNonEOFReadErrorHandling github.com/zrepl/zrepl/platformtest/tests.UndestroyableSnapshotParsing SKIPPED TESTS: github.com/zrepl/zrepl/platformtest/tests.SendArgsValidationEncryptedSendOfUnencryptedDatasetForbidden__EncryptionSupported_false FAILED TESTS: [] If there is a failure, or a skipped test that you believe should be passing, re-run the test suite, capture stderr & stdout to a text file, and create an issue on GitHub. To run a specific test case, or a subset of tests matched by regex, use the -run REGEX command line flag. To stop test execution at the first failing test, and prevent cleanup of the dummy zpool, use the -failure.stop-and-keep-pool flag. To build the platformtests yourself, use make test-platform-bin. There's also the make test-platform target to run the platform tests with a default command line. Talks & Presentations
ChangelogThe changelog summarizes bugfixes that are deemed relevant for users and package maintainers. Developers should consult the git commit log or GitHub issue tracker.0.6 (Unreleased)
0.5
Note to all users: please read up on the following OpenZFS bugs, as you might be affected:
Finally, I'd like to point you to the GitHub discussion about which bugfixes and features should be prioritized in zrepl 0.6 and beyond! NOTE: zrepl is a spare-time project primarily developed by Christian Schwarz. You can support maintenance and feature development through one of the following services: Donate via Patreon Donate via GitHub Sponsors Donate via Liberapay Donate via PayPal Note that PayPal processing fees are relatively high for small donations. For SEPA wire transfer and commercial support, please contact Christian directly. 0.4.0
For users who skipped the 0.3.1 update: please make sure your pruning grid config is correct. The following bugfix in 0.3.1 caused problems for some users:
0.3.1Mostly a bugfix release for zrepl 0.3.
0.3This is a big one! Headlining features:
TIP: We highly recommend studying the updated overview section
of the configuration chapter to understand how replication works.
TIP: Go 1.15 changed the default TLS validation policy to
require Subject Alternative Names (SAN) in certificates. The openssl
commands we provided in the quick-start guides up to and including the zrepl
0.3 docs seem not to work properly. If you encounter certificate validation
errors regarding SAN and wish to continue to use your old certificates, start
the zrepl daemon with env var GODEBUG=x509ignoreCN=0. Alternatively,
generate new certificates with SANs (see both options int the TLS transport
docs ).
Quick-start guides:
Additional changelog:
0.2.1
0.2
0.1.1
0.1This release is a milestone for zrepl and required significant refactoring if not rewrites of substantial parts of the application. It breaks both configuration and transport format, and thus requires manual intervention and updates on both sides of a replication setup.DANGER: The changes in the pruning system for this release
require you to explicitly define keep rules: for any snapshot that you
want to keep, at least one rule must match. This is different from previous
releases where pruning only affected snapshots with the configured
snapshotting prefix. Make sure that snapshots to be kept or ignored by zrepl
are covered, e.g. by using the regex keep rule. Learn more in the
config docs...
Notes to Package Maintainers
Changes
Previous ReleasesNOTE:Due to limitations in our documentation system, we only
show the changelog since the last release and the time this documentation is
built. For the changelog of previous releases, use the version selection in
the hosted version of these docs at zrepl.github.io.
Donate via Patreon Donate via GitHub Sponsors Donate via Liberapay Donate via PayPal zrepl is a spare-time project primarily developed by Christian Schwarz. You can support maintenance and feature development through one of the services listed above. For SEPA wire transfer and commercial support, please contact Christian directly. Thanks for your support! NOTE: PayPal takes a relatively high fixed processing fee plus
percentage of the donation. Larger less-frequent donations make more sense
there.
SupportersWe would like to thank the following people and organizations for supporting zrepl through monetary and other means:
AUTHORChristian SchwarzCOPYRIGHT2017-2019, Christian Schwarz
Visit the GSP FreeBSD Man Page Interface. |