GitLab直接整体拷贝相关的数据、配置和日志目录来实现备份实际是无法起效的,GitLab官方目前提供的唯一方式就是使用命令行方式比如gitlab-rake(GitLab 12.1之前)或者gitlab-backup命令来实现备份,但这两种方式一般都是适用于全量备份,这篇文章讨论一下如何在GitLab中实现增量备份。GitLab目前提供的增量备份,并不是严格意义上的增量备份,通过这篇文章的验证,我们来了解一下这种机制的实现和效果。
全量备份 vs 增量备份
实际上有三种常见备份策略,文初图示就是中间的差分备份(Differential Backup)
策略 | 备份速度 | 磁盘使用量 | 备份文件对象 | 恢复所需文件 | 恢复速度 |
---|---|---|---|---|---|
全量备份 | 低 | 高 | 所有文件 | 全量备份文件 | 高 |
差分备份 | 中 | 由中到高 | 有变更的文件 | 全量备份文件和差分备份 | 高 |
增量备份 | 高 | 低 | 有变更的文件 | 全量备份文件和其后的所有增量备份文件 | 低 |
GitLab是否支持增量备份
对于这个非常简单的问题,但是回答有点绕,实际上GitLab是没有直接提供增量备份的功能的,比如使用gitlab-rake类似的命令可以在某次全量备份的基础之上直接生成从上次到指定时间点的备份数据,至少这种机制在目前还是不存在的,详细可以参看如下GitLab的一个Issue:
- https://gitlab.com/gitlab-org/gitlab-foss/-/issues/36975
上述Issue已经关闭,原因是因为如下Issue的存在:
此Issue已经放到backlog中,可以期待一下,但目前尚不知道何时完成,但是这种基础功能应该是会增强的。
备份过程
在备份与恢复指南中,已经整理了使用的方式,详细可参看:
环境准备
环境的创建和准备可参看:
备份源的GitLab的信息如下(本次实验中,host131对应的端口映射出来为宿主机器32001,host132为32002)
增量备份步骤
- 步骤1: 执行如下命令
执行命令:gitlab-backup create BACKUP=incremental_rsyncable GZIP_RSYNCABLE=yes
[root@host131 gitlab]# docker exec -it gitlab_gitlab_1 sh
# gitlab-backup create BACKUP=incremental_rsyncable GZIP_RSYNCABLE=yes
2020-08-19 22:44:58 +0000 -- Dumping database ...
Dumping PostgreSQL database gitlabhq_production ... [DONE]
2020-08-19 22:45:02 +0000 -- done
2020-08-19 22:45:02 +0000 -- Dumping repositories ...
* root/webhookproject (@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b) ... [DONE]
[SKIPPED] Wiki
2020-08-19 22:45:03 +0000 -- done
2020-08-19 22:45:03 +0000 -- Dumping uploads ...
2020-08-19 22:45:03 +0000 -- done
2020-08-19 22:45:03 +0000 -- Dumping builds ...
2020-08-19 22:45:03 +0000 -- done
2020-08-19 22:45:03 +0000 -- Dumping artifacts ...
2020-08-19 22:45:03 +0000 -- done
2020-08-19 22:45:03 +0000 -- Dumping pages ...
2020-08-19 22:45:03 +0000 -- done
2020-08-19 22:45:03 +0000 -- Dumping lfs objects ...
2020-08-19 22:45:03 +0000 -- done
2020-08-19 22:45:03 +0000 -- Dumping container registry images ...
2020-08-19 22:45:03 +0000 -- [DISABLED]
Creating backup archive: incremental_rsyncable_gitlab_backup.tar ... done
Uploading backup archive to remote storage ... skipped
Deleting tmp directories ... done
done
done
done
done
done
done
done
Deleting old backups ... skipping
Warning: Your gitlab.rb and gitlab-secrets.json files contain sensitive data
and are not included in this backup. You will need these files to restore a backup.
Please back them up manually.
Backup task is done.
# cd /var/opt/gitlab/backups
# ls -l incremental_rsyncable_gitlab_backup.tar
-rw------- 1 git git 184320 Aug 19 22:45 incremental_rsyncable_gitlab_backup.tar
#
- 步骤2: 拷贝备份文件至backups目录下,并确保权限
拷贝文件至目标机器的相应目录
[root@host131 backups]# scp incremental_rsyncable_gitlab_backup.tar 192.168.163.132:/root/gitlab/data/backups
root@192.168.163.132's password:
incremental_rsyncable_gitlab_backup.tar 100% 180KB 37.8MB/s 00:00
[root@host131 backups]#
设定权限
[root@host132 backups]# chmod 644 incremental_rsyncable_gitlab_backup.tar
[root@host132 backups]# pwd
/root/gitlab/data/backups
[root@host132 backups]#
- 步骤3: 使用gitlab-ctl命令停止unicorn(或者puma)以及sidekiq服务
停止服务
[root@host132 backups]# docker exec -it gitlab_gitlab_1 sh
# gitlab-ctl stop unicorn
ok: down: unicorn: 0s, normally up
# gitlab-ctl stop sidekiq
ok: down: sidekiq: 0s, normally up
#
状态确认
# gitlab-ctl status
run: alertmanager: (pid 1372) 1630s; run: log: (pid 942) 1790s
run: gitaly: (pid 1349) 1633s; run: log: (pid 459) 1897s
run: gitlab-exporter: (pid 1326) 1634s; run: log: (pid 865) 1807s
run: gitlab-workhorse: (pid 1318) 1635s; run: log: (pid 804) 1826s
run: grafana: (pid 1387) 1629s; run: log: (pid 1243) 1664s
run: logrotate: (pid 837) 1817s; run: log: (pid 846) 1815s
run: nginx: (pid 819) 1823s; run: log: (pid 831) 1820s
run: postgres-exporter: (pid 1381) 1630s; run: log: (pid 959) 1784s
run: postgresql: (pid 490) 1892s; run: log: (pid 615) 1889s
run: prometheus: (pid 1343) 1633s; run: log: (pid 914) 1796s
run: redis: (pid 424) 1904s; run: log: (pid 439) 1903s
run: redis-exporter: (pid 1329) 1634s; run: log: (pid 893) 1802s
down: sidekiq: 44s, normally up; run: log: (pid 780) 1832s
run: sshd: (pid 31) 1919s; run: log: (pid 30) 1919s
down: unicorn: 61s, normally up; run: log: (pid 756) 1840s
#
- 步骤4: 使用gitlab-backup restore进行数据恢复
# pwd
/var/opt/gitlab/backups
# ls -l
total 180
-rw-r--r-- 1 root root 184320 Aug 19 23:02 incremental_rsyncable_gitlab_backup.tar
#
# gitlab-backup restore BACKUP=incremental_rsyncable
Unpacking backup ... done
Before restoring the database, we will remove all existing
tables to avoid future upgrade problems. Be aware that if you have
custom tables in the GitLab database these tables and all data will be
removed.
Do you want to continue (yes/no)? yes
Removing all tables. Press `Ctrl-C` within 5 seconds to abort
2020-08-19 23:20:42 +0000 -- Cleaning the database ...
2020-08-19 23:20:43 +0000 -- done
2020-08-19 23:20:43 +0000 -- Restoring database ...
Restoring PostgreSQL database gitlabhq_production ... SET
SET
SET
SET
SET
set_config
------------
(1 row)
SET
SET
SET
SET
ERROR: relation "public.u2f_registrations" does not exist
ERROR: relation "public.timelogs" does not exist
...省略
ALTER TABLE
ALTER TABLE
[DONE]
2020-08-19 23:20:55 +0000 -- done
2020-08-19 23:20:55 +0000 -- Restoring repositories ...
* root/webhookproject ... [DONE]
2020-08-19 23:20:56 +0000 -- done
2020-08-19 23:20:56 +0000 -- Restoring uploads ...
2020-08-19 23:20:56 +0000 -- done
2020-08-19 23:20:56 +0000 -- Restoring builds ...
2020-08-19 23:20:56 +0000 -- done
2020-08-19 23:20:56 +0000 -- Restoring artifacts ...
2020-08-19 23:20:56 +0000 -- done
2020-08-19 23:20:56 +0000 -- Restoring pages ...
2020-08-19 23:20:56 +0000 -- done
2020-08-19 23:20:56 +0000 -- Restoring lfs objects ...
2020-08-19 23:20:56 +0000 -- done
This task will now rebuild the authorized_keys file.
You will lose any data stored in the authorized_keys file.
Do you want to continue (yes/no)? yes
Warning: Your gitlab.rb and gitlab-secrets.json files contain sensitive data
and are not included in this backup. You will need to restore these files manually.
Restore task is done.
#
-
步骤5: 手工恢复gitlab-secrets.json文件与gitlab.rb
此步骤在本次实验中跳过 -
步骤6: 重设、重启服务并检查
执行命令:gitlab-ctl reconfigure && gitlab-ctl restart && gitlab-rake gitlab:check SANITIZE=true
# gitlab-ctl reconfigure
Starting Chef Client, version 14.14.29
resolving cookbooks for run list: ["gitlab"]
Synchronizing Cookbooks:
...省略
Running handlers:
Running handlers complete
Chef Client finished, 4/699 resources updated in 12 seconds
gitlab Reconfigured!
#
# gitlab-ctl restart
ok: run: alertmanager: (pid 5302) 0s
ok: run: gitaly: (pid 5310) 0s
ok: run: gitlab-exporter: (pid 5317) 1s
ok: run: gitlab-workhorse: (pid 5319) 0s
ok: run: grafana: (pid 5333) 0s
ok: run: logrotate: (pid 5344) 1s
ok: run: nginx: (pid 5354) 0s
ok: run: postgres-exporter: (pid 5366) 1s
ok: run: postgresql: (pid 5374) 0s
ok: run: prometheus: (pid 5384) 1s
ok: run: redis: (pid 5391) 0s
ok: run: redis-exporter: (pid 5397) 0s
ok: run: sidekiq: (pid 5402) 1s
ok: run: sshd: (pid 5408) 0s
ok: run: unicorn: (pid 5410) 1s
#
# gitlab-rake gitlab:check SANITIZE=true
Checking GitLab subtasks ...
Checking GitLab Shell ...
GitLab Shell: ... GitLab Shell version >= 12.2.0 ? ... OK (12.2.0)
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Internal API available: FAILED - Internal API unreachable
gitlab-shell self-check failed
Try fixing it:
Make sure GitLab is running;
Check the gitlab-shell configuration file:
sudo -u git -H editor /opt/gitlab/embedded/service/gitlab-shell/config.yml
Please fix the error above and rerun the checks.
Checking GitLab Shell ... Finished
Checking Gitaly ...
Gitaly: ... default ... OK
Checking Gitaly ... Finished
Checking Sidekiq ...
Sidekiq: ... Running? ... no
Try fixing it:
sudo -u git -H RAILS_ENV=production bin/background_jobs start
For more information see:
doc/install/installation.md in section "Install Init Script"
see log/sidekiq.log for possible errors
Please fix the error above and rerun the checks.
Checking Sidekiq ... Finished
Checking Incoming Email ...
Incoming Email: ... Reply by email is disabled in config/gitlab.yml
Checking Incoming Email ... Finished
Checking LDAP ...
LDAP: ... LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab App ...
Git configured correctly? ... yes
Database config exists? ... yes
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config up to date? ... yes
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory exists? ... yes
Uploads directory has correct permissions? ... yes
Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet)
Init script exists? ... skipped (omnibus-gitlab has no init script)
Init script up-to-date? ... skipped (omnibus-gitlab has no init script)
Projects have namespace: ...
1/1 ... yes
Redis version >= 4.0.0? ... yes
Ruby version >= 2.5.3 ? ... yes (2.6.5)
Git version >= 2.22.0 ? ... yes (2.26.2)
Git user has default SSH configuration? ... yes
Active users: ... 1
Is authorized keys file accessible? ... yes
Checking GitLab App ... Finished
Checking GitLab subtasks ... Finished
#
注:虽然提示了sidekiq没有在Running的阶段,gitlab-ctl status确认状态无误,大概是正在启动中导致,因为这个过程中此恢复数据的GitLab服务出现了502错误,但是等了一阵就能正常动作了
登录备份后的GitLab服务(本次实验中,host131对应的端口映射出来为宿主机器32001,host132为32002),可以看到数据已经恢复
增量备份
现在才真正开始确认增量备份,重新执行一次备份,信息如下所示
# gitlab-backup create BACKUP=incremental_rsyncable GZIP_RSYNCABLE=yes
2020-08-19 23:43:37 +0000 -- Dumping database ...
Dumping PostgreSQL database gitlabhq_production ... [DONE]
2020-08-19 23:43:43 +0000 -- done
2020-08-19 23:43:43 +0000 -- Dumping repositories ...
* root/webhookproject (@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b) ... [DONE]
[SKIPPED] Wiki
2020-08-19 23:43:44 +0000 -- done
2020-08-19 23:43:44 +0000 -- Dumping uploads ...
2020-08-19 23:43:44 +0000 -- done
2020-08-19 23:43:44 +0000 -- Dumping builds ...
2020-08-19 23:43:44 +0000 -- done
2020-08-19 23:43:44 +0000 -- Dumping artifacts ...
2020-08-19 23:43:44 +0000 -- done
2020-08-19 23:43:44 +0000 -- Dumping pages ...
2020-08-19 23:43:44 +0000 -- done
2020-08-19 23:43:44 +0000 -- Dumping lfs objects ...
2020-08-19 23:43:44 +0000 -- done
2020-08-19 23:43:44 +0000 -- Dumping container registry images ...
2020-08-19 23:43:44 +0000 -- [DISABLED]
Creating backup archive: incremental_rsyncable_gitlab_backup.tar ... done
Uploading backup archive to remote storage ... skipped
Deleting tmp directories ... done
done
done
done
done
done
done
done
Deleting old backups ... skipping
Warning: Your gitlab.rb and gitlab-secrets.json files contain sensitive data
and are not included in this backup. You will need these files to restore a backup.
Please back them up manually.
Backup task is done.
# ls -l incrementa*
-rw------- 1 git git 184320 Aug 19 23:43 incremental_rsyncable_gitlab_backup.tar
#
执行的过程中时间和结果分析上并未发现有太多区别,由于上次使用了scp进行拷贝,此次使用官方提示的能够实现incremental的rsync,第一次结果如下所示:
# rsync -vzrtopg --progress incremental_rsyncable_gitlab_backup.tar 192.168.163.132:/root/gitlab/data/backups
root@192.168.163.132's password:
sending incremental file list
incremental_rsyncable_gitlab_backup.tar
184,320 100% 25.02MB/s 0:00:00 (xfr#1, to-chk=0/1)
sent 131 bytes received 1,619 bytes 388.89 bytes/sec
total size is 184,320 speedup is 105.33
#
第二次再次传输(此处并未再次执行gitlab-backup)
# rsync -vzrtopg --progress incremental_rsyncable_gitlab_backup.tar 192.168.163.132:/root/gitlab/data/backups
root@192.168.163.132's password:
sending incremental file list
sent 88 bytes received 12 bytes 22.22 bytes/sec
total size is 184,320 speedup is 1,843.20
#
重新执行backup,没有看出来差分备份如何体现的
# date
Wed Aug 19 23:54:46 UTC 2020
# gitlab-backup create BACKUP=incremental_rsyncable GZIP_RSYNCABLE=yes
2020-08-19 23:55:44 +0000 -- Dumping database ...
Dumping PostgreSQL database gitlabhq_production ... [DONE]
2020-08-19 23:55:51 +0000 -- done
2020-08-19 23:55:51 +0000 -- Dumping repositories ...
* root/webhookproject (@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b) ... [DONE]
[SKIPPED] Wiki
2020-08-19 23:55:52 +0000 -- done
2020-08-19 23:55:52 +0000 -- Dumping uploads ...
2020-08-19 23:55:52 +0000 -- done
2020-08-19 23:55:52 +0000 -- Dumping builds ...
2020-08-19 23:55:52 +0000 -- done
2020-08-19 23:55:52 +0000 -- Dumping artifacts ...
2020-08-19 23:55:52 +0000 -- done
2020-08-19 23:55:52 +0000 -- Dumping pages ...
2020-08-19 23:55:52 +0000 -- done
2020-08-19 23:55:52 +0000 -- Dumping lfs objects ...
2020-08-19 23:55:53 +0000 -- done
2020-08-19 23:55:53 +0000 -- Dumping container registry images ...
2020-08-19 23:55:53 +0000 -- [DISABLED]
Creating backup archive: incremental_rsyncable_gitlab_backup.tar ... done
Uploading backup archive to remote storage ... skipped
Deleting tmp directories ... done
done
done
done
done
done
done
done
Deleting old backups ... skipping
Warning: Your gitlab.rb and gitlab-secrets.json files contain sensitive data
and are not included in this backup. You will need these files to restore a backup.
Please back them up manually.
Backup task is done.
# date
Wed Aug 19 23:55:57 UTC 2020
#
再次执行rsync传输,只有一个speed up
# rsync -vzrtopg --progress incremental_rsyncable_gitlab_backup.tar 192.168.163.132:/root/gitlab/data/backups
root@192.168.163.132's password:
sending incremental file list
incremental_rsyncable_gitlab_backup.tar
184,320 100% 10.78MB/s 0:00:00 (xfr#1, to-chk=0/1)
sent 2,057 bytes received 1,619 bytes 816.89 bytes/sec
total size is 184,320 speedup is 50.14
#
总结
目前阶段的GitLab增量备份,可以考虑使用SKIP=tar的方式,再结合使用rsync,效果可能更好。但是也只是实现了传输的增量方式。
转载:https://blog.csdn.net/liumiaocn/article/details/107936967