副本

集群架构

假设存在一个集群，如下：

01	02	03
Zookeeper	Zookeeper	Zookeeper
ClickHouse	ClickHouse	ClickHouse
192.168.13.146	192.168.13.147	192.168.13.148

Zookeeper集群

参考《基于Java的后端开发入门：20.Dubbo和Zookeeper》，配置集群。

端口

需要开启9009端口，数据复制的时候需要。

否则可能会有报错如下：

1 2	Timeout: connect timed out: 192.168.13.146:9009, Stack trace (when copying this message, always include the lines below):

开启方法，示例代码：

1 2	firewall-cmd --zone=public --add-port=9009/tcp --permanent firewall-cmd --reload

配置文件

可以利用外部配置文件，也可以利用内部配置文件。修改完成后，重启ClickHouse。

外部配置文件

在三台机器的/etc/clickhouse-server/config.d目录下，新建一个名为metrika.xml的配置文件，内容如下：

<yandex>
    <zookeeper-servers>
        <node index="1">
            <host>192.168.13.146</host>
            <port>2181</port>
        </node>
        <node index="2">
            <host>192.168.13.147</host>
            <port>2181</port>
        </node>
        <node index="3">
            <host>192.168.13.148</host>
            <port>2181</port>
        </node>
    </zookeeper-servers>
</yandex>

另外还需要修改该文件的权限用户组，示例代码：

1	chown clickhouse:clickhouse metrika.xml

打开config.xml，我们会看到这么一段：

<!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
     By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
     Values for substitutions are specified in /clickhouse/name_of_substitution elements in that file.
  -->

指定metrika.xml的路径，添加如下内容：

1 2	<zookeeper incl="zookeeper-servers" optional="true" /> <include_from>/etc/clickhouse-server/config.d/metrika.xml</include_from>

内部配置文件

也可以不创建外部文件，直接在config.xml中指定，在<zookeeper>标签下。

建表

副本只能同步数据，不能同步表结构，所以我们需要在每台机器上自己手动建表。建表语句如下：

create table t_order_rep
(
    id           UInt32,
    sku_id       String,
    total_amount Decimal(16, 2),
    create_time  Datetime
) engine ReplicatedMergeTree('/clickhouse/table/01/t_order_rep', '01')
      partition by toYYYYMMDD(create_time)
      primary key (id)
      order by (id, sku_id);

解释说明：

ReplicatedMergeTree是有副本版本的MergeTree，对于我们讨论的其他的MergeTree系列的引擎，也都有副本版本，如ReplicatedSummingMergeTree、ReplicatedReplacingMergeTree。
可以参考：https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication
第一个参数'/clickhouse/table/01/t_order_rep'是Zookeeper的注册地址：
- clickhouse是惯例，表示注册的事ClickHouse相关的。
- table表示是ClickHouse中的表，也是惯例。
- 01是分片，在这个例子中默认只有一个分片。
- t_order_rep是表名。
第二个参数是'01'是副本名称，注意不同的副本不一样。

建表完成后，我们可以试一下。在其中一台机器插入数据，在另一台机器查询。

分片

什么是分片

副本虽然能够提高数据的可用性，降低丢失风险，但是每台服务器实际上必须容纳全量数据，对数据的横向扩容没有解决。
要解决数据水平切分的问题，需要引入分片的概念。
通过分片把一份完整的数据进行切分，不同的分片分布到不同的节点上，再通过Distributed表引擎把数据拼接起来一同使用。
Distributed表引擎本身不存储数据，有点类似于MyCat之于MySQL，是一种中间件，通过分布式逻辑表来写入、分发、路由来操作多台节点不同分片的分布式数据。

案例

配置文件

假设一共有两个分片，其中一个分片有两个副本，另一个分片只有一个副本。则配置文件如下：

<yandex>
    <remote_servers>
        <!-- 集群名称 -->
        <gmall_cluster>
            <!-- 集群的第一个分片 -->
            <shard>
                <!-- 开启内部复制 -->
                <internal_replication>true</internal_replication>
                <!-- 该分片的第一个副本 -->
                <replica>
                    <host>node01</host>
                    <port>9000</port>
                </replica>
                <!-- 该分片的第二个副本 -->
                <replica>
                    <host>node02</host>
                    <port>9000</port>
                </replica>
            </shard>
            <!-- 集群的第二个分片 -->
            <shard>
                <!-- 开启内部复制 -->
                <internal_replication>true</internal_replication>
                <replica>
                    <!-- 该分片的第一个副本 -->
                    <host>node03</host>
                    <port>9000</port>
                </replica>
            </shard>
        </gmall_cluster>
    </remote_servers>
    <zookeeper-servers>
        <node index="1">
            <host>node01</host>
            <port>2181</port>
        </node>
        <node index="2">
            <host>node02</host>
            <port>2181</port>
        </node>
        <node index="3">
            <host>node03</host>
            <port>2181</port>
        </node>
    </zookeeper-servers>
    <!-- 引擎参数的变量 -->
    <macros>
        <!-- 分片名称 相同值则互为副本 -->
        <share>1</share>
        <!-- 副本名称 每个节点不同 -->
        <replica>rep_1_1</replica>
    </macros>
</yandex>

对于第二台机器，其macros标签内容如下：

<macros>
    <!-- 分片名称 相同值则互为副本 -->
    <share>1</share>
    <!-- 副本名称 每个节点不同 -->
    <replica>rep_1_2</replica>
</macros>

对于第三台机器，其macros标签内容如下：

<macros>
    <!-- 分片名称 相同值则互为副本 -->
    <share>2</share>
    <!-- 副本名称 每个节点不同 -->
    <replica>rep_2_1</replica>
</macros>

分别创建表

示例代码：

create table st_order_mt on cluster gmall_cluster
(
    id           UInt32,
    sku_id       String,
    total_amount Decimal(16, 2),
    create_time  Datetime
) engine = ReplicatedMergeTree('/clickhouse/{share}/st_order_mt', '{replica}')
      partition by toYYYYMMDD(create_time)
      primary key (id)
      order by (id, sku_id);

运行结果：


┌─host───┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ node03 │ 9000 │      0 │       │                   2 │                0 │
│ node01 │ 9000 │      0 │       │                   1 │                0 │
│ node02 │ 9000 │      0 │       │                   0 │                0 │
└────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

解释说明：

只需要在一个节点上执行，会自动同步到其他节点。
集群名字要和配置文件中的一致。
分片和副本名称从配置文件的宏定义中获取。

创建分布式的总表

创建总表，示例代码：

create table st_order_mt_all on cluster gmall_cluster
(
    id           UInt32,
    sku_id       String,
    total_amount Decimal(16, 2),
    create_time  Datetime
) engine = Distributed(gmall_cluster, default, st_order_mt, hiveHash(sku_id));

运行结果：


┌─host───┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ node02 │ 9000 │      0 │       │                   2 │                0 │
└────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host───┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ node03 │ 9000 │      0 │       │                   1 │                0 │
└────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host───┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ node01 │ 9000 │      0 │       │                   0 │                0 │
└────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

解释说明：

只需要在一个节点上执行，会自动同步到其他节点。
Distributed(集群名称库名本地表名分片键)
分片键必须是整型数字，所以用hiveHash函数转换。

验证

插入数据

插入数据，示例代码：

insert into st_order_mt_all
values (201, 'sku_001', 1000.00, '2020 06 01 12:00:00'),
       (202, 'sku_002', 2000.00, '2020 06 01 12:00:00'),
       (203, 'sku_004', 2500.00, '2020 06 01 12:00:00'),
       (204, 'sku_002', 2000.00, '2020 06 01 12:00:00')
       (205, 'sku_003', 600.00, '2020 06 02 12:00:00');

检查

查询总表，示例代码：

1	SELECT * FROM st_order_mt_all;

运行结果：


┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 201 │ sku_001 │         1000 │ 2020-06-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 205 │ sku_003 │          600 │ 2020-06-02 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 202 │ sku_002 │         2000 │ 2020-06-01 12:00:00 │
│ 203 │ sku_004 │         2500 │ 2020-06-01 12:00:00 │
│ 204 │ sku_002 │         2000 │ 2020-06-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘

在三个节点分别查询st_order_mt，会发现已经实现"两个分区，其中一个分区两个副本，另一个分区只有一个副本"，即：


┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 202 │ sku_002 │         2000 │ 2020-06-01 12:00:00 │
│ 203 │ sku_004 │         2500 │ 2020-06-01 12:00:00 │
│ 204 │ sku_002 │         2000 │ 2020-06-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘


┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 202 │ sku_002 │         2000 │ 2020-06-01 12:00:00 │
│ 203 │ sku_004 │         2500 │ 2020-06-01 12:00:00 │
│ 204 │ sku_002 │         2000 │ 2020-06-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘


┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 201 │ sku_001 │         1000 │ 2020-06-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 205 │ sku_003 │          600 │ 2020-06-02 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘

备份

clickhouse-backup

clickhouse-backup，开源工具，用于实现备份。
GitHub地址：https://github.com/Altinity/clickhouse-backup

安装

下载

下载安装包，示例代码：

1	wget https://github.com/Altinity/clickhouse-backup/releases/download/v2.5.26/clickhouse-backup-linux-amd64.tar.gz

解压

解压，示例代码：

1	tar xvf clickhouse-backup-linux-amd64.tar.gz

创建软连接

创建软连接，示例代码：

1	ln -sv /usr/local/clickhouse-backup/build/linux/amd64/clickhouse-backup /usr/local/bin

测试clickhouse-backup命令，示例代码：

1	clickhouse-backup -v

运行结果：

1
2
3

Version:         2.5.26
Git Commit:      270e0ed5bbe4dbcbcb64fd2e7a31809af53ab4f0
Build Date:      2024-08-07

配置文件

创建配置文件，示例代码：

1
2
3

mkdir -p /etc/clickhouse-backup/
cd /etc/clickhouse-backup/
vim config.yml

配置文件内容如下：

general:
  remote_storage: none
  # 本地备份保留个数，默认0表示不自动做备份清理
  backups_to_keep_local: 7
  # 远程备份保留个数
  # backups_to_keep_remote: 1
clickhouse:
  username: default
  # password: "XXXXXX"
  host: localhost
  port: 9000
  # data_path: "/var/lib/clickhouse"

解释说明：

默认data_path是/var/lib/clickhouse，如果clickhouse-server存储数据的路径变了，需要额外进行配置。
默认backups_to_keep_remote，远程备份，例如备份到阿里OSS中。

备份

查看可备份的表

查看可备份的表，示例代码：

1	clickhouse-backup tables

运行结果：


【部分运行结果略】

test.t_order             1.88KiB  default  full
default.t_order_rep1     1.27KiB  default  full
default.st_order_mt      0B       default  full
default.st_order_mt_all  0B       default  full

【部分运行结果略】

全实例备份

全实例备份，示例代码：

1	clickhouse-backup create

备份位于/var/lib/clickhouse/backup/，示例代码：

1	ll /var/lib/clickhouse/backup/

运行结果：

1	drwxr-x--- 4 clickhouse clickhouse 57 Aug 8 02:51 2024-08-08T09-51-23

备份名称默认为时间戳，我们可手动指定备份名称，示例代码：

1	clickhouse-backup create ch_bk_20240808

特别的，我们可以看看ch_bk_20240808中的内容。示例代码：

1	ll ch_bk_20240808

运行结果：

1
2
3

drwxr-x--- 4 clickhouse clickhouse  33 Aug  8 02:54 metadata
-rw-r----- 1 clickhouse clickhouse 840 Aug  8 02:54 metadata.json
drwxr-x--- 4 clickhouse clickhouse  33 Aug  8 02:54 shadow

解释说明：

metadata：包含重新创建所需的DDL的SQL。
shadow目录：包含作为ALTER TABLE ... FREEZE操作结果的数据。

单表备份

语法格式：

1	clickhouse-backup create [-t, --tables=<db>.<table>] <backup_name>

备份test库中的t_order表，示例代码：

1	clickhouse-backup create -t test.t_order ch_t_order_2024080818

多表备份

例如，备份test库中的t_order表和t_new表，两个表用,隔开。示例代码：

1	clickhouse-backup create -t test.t_order,test.t_new ch_two_bak_202400818

恢复

常见用法

全库恢复

1	clickhouse-backup restore 【备份名】

示例代码：

1	clickhouse-backup restore 2024-08-08T09-51-23

恢复单表

语法：

1	clickhouse-backup restore 【备份名】 --table 【库名.表名】 --schema

示例代码：

1	clickhouse-backup restore 2024-08-08T09-51-23 --table test.t_order

只恢复表结构

使用--schema恢复表的表结构，语法

1	clickhouse-backup restore 【备份名】 --table 【库名.表名】 --schema

示例代码：

1	clickhouse-backup restore 2024-08-08T09-51-23 --table test.t_order --schema

只恢复数据

使用--data恢复表中数据。注意，如果执行2次的话，数据会翻倍。语法：

1	clickhouse-backup restore 【备份名】 --table 【库名.表名】 --data

错误处理

在恢复过程中，可能会有如下的报错：

error="can't create table `test`.`t_order`: code: 57, message: Directory for table data store/46c/46c04c53-3a72-48d0-a24a-568c33dbc39a/ already exists after 1 times, please check your schema dependencies"

解决方法是找到备份目录的metadata中对应的JSON文件。例如2024-08-08T09-51-23/metadata/test/t_order.json，删除UUID '46c04c53-3a72-48d0-a24a-568c33dbc39a'。

{
 "table": "t_order",
 "database": "test",
 "parts": {
  "default": [
   {
    "name": "202203_1_1_0"
   },
   {
    "name": "202203_3_3_0"
   },
   {
    "name": "202303_2_2_0"
   }
  ]
 },
 "query": "CREATE TABLE test.t_order UUID '46c04c53-3a72-48d0-a24a-568c33dbc39a' (`id` Int64 COMMENT '订单id', `datetime` DateTime COMMENT '订单日期', `name` String COMMENT '手办名称', `price` Decimal(9, 2) COMMENT '手办价格', `user_id` Int64 COMMENT '用户id') ENGINE = MergeTree PARTITION BY toYYYYMM(datetime) ORDER BY id SETTINGS index_granularity = 8192",
 "size": {
  "default": 3306
 },
 "total_bytes": 1926,
 "metadata_only": false
}

完整用法

clickhouse-backup restore 备份名

NAME:
   clickhouse-backup restore - Create schema and restore data from backup

USAGE:
   clickhouse-backup restore  [-t, --tables=<db>.<table>] [-m, --restore-database-mapping=<originDB>:<targetDB>[,<...>]] [--partitions=<partitions_names>] [-s, --schema] [-d, --data] [--rm, --drop] [-i, --ignore-dependencies] [--rbac] [--configs] <backup_name>

OPTIONS:
   --config value, -c value                    Config 'FILE' name. (default: "/etc/clickhouse-backup/config.yml") [$CLICKHOUSE_BACKUP_CONFIG]
   --table value, --tables value, -t value     Restore only database and objects which matched with table name patterns, separated by comma, allow ? and * as wildcard
   --restore-database-mapping value, -m value  Define the rule to restore data. For the database not defined in this struct, the program will not deal with it.
   --partitions partition_id                   Restore backup only for selected partition names, separated by comma
If PARTITION BY clause returns numeric not hashed values for partition_id field in system.parts table, then use --partitions=partition_id1,partition_id2 format
If PARTITION BY clause returns hashed string values, then use --partitions=('non_numeric_field_value_for_part1'),('non_numeric_field_value_for_part2') format
If PARTITION BY clause returns tuple with multiple fields, then use --partitions=(numeric_value1,'string_value1','date_or_datetime_value'),(...) format
Values depends on field types in your table, use single quotes for String and Date/DateTime related types
Look at the system.parts partition and partition_id fields for details https://clickhouse.com/docs/en/operations/system-tables/parts/
   --schema, -s                                        Restore schema only
   --data, -d                                          Restore data only
   --rm, --drop                                        Drop exists schema objects before restore
   -i, --ignore-dependencies                           Ignore dependencies when drop exists schema objects
   --rbac, --restore-rbac, --do-restore-rbac           Restore RBAC related objects
   --configs, --restore-configs, --do-restore-configs  Restore 'clickhouse-server' CONFIG related files
   --rbac-only                                         Restore RBAC related objects only, will skip backup data, will backup schema only if --schema added
   --configs-only                                      Restore 'clickhouse-server' configuration files only, will skip backup data, will backup schema only if --schema added