avatar


日志采集工具Filebeat

概述

简介

Filebeat中还内置了多种模块(Apache、NGINX、MySQL等),可通过一个简单的命令,对常见应用的日志进行采集。

Filebeat是elastic这家公司的。
上文的官网,还有一个特点,filebeat是在beats/目录下的,除了Filebeat,elastic还有其他的beats:

  • Metricbeat:轻量型指标数据采集器。
  • Packetbeat:轻量型网络数据采集器。
  • Winlogbeat:轻量型Windows事件日志采集器。
  • Auditbeat:轻量型审计数据采集器。
  • Heartbeat:用于运行状态监测的轻量型采集器。
  • Functionbeat:用于采集云端数据的无服务器采集器。

特点

  1. 轻量型日志采集器,占用资源更少,对机器配置要求极低。
  2. 操作简便,可将采集到的日志信息直接发送到Kafka、Logstash或者ElasticSearch等。
  3. 异常中断重启后会继续上次停止的位置。
    (通过${filebeat_home}\data\registry文件来记录日志的偏移量。)
  4. 使用压力敏感协议(backpressure-sensitive)来传输数据,当目标(例如logstash)忙的时候,Filebeat会减慢读取传输速度,一旦目标(例如logstash)恢复,Filebeat也会恢复速度。
  5. Filebeat带有内部模块(Apache、Nginx、MySQL等),可通过一个简单的命令,对常见应用的日志进行采集。

和Logstash的对比

  • Filebeat更轻量级,占用空间更小,使用的系统资源更少;当然功能更简单。
  • Logstash会占用更大的资源;功能也更丰富,有大量的输入、过滤和输出插件,可以用于收集、转换来自各种来源的数据。
  • 在实现语言上,Filebeat采用Golang编写;Logstach采用Java编写,插件采用jruby编写。

如果我们在部署业务系统的机器上,再部署一套Logstash进行日志收集,所占用的资源会比部署Filebeat更大。

安装

  • 官网:https://www.elastic.co/cn/products/beats/filebeat

Linux系统

在Linux系统上有很多中安装方式:.rpm.deb以及.tar

rpm和deb

通过.rpm.deb的方式进行安装:

  • 应用位于/usr/share/filebeat/目录
  • 配置文件位于/etc/filebeat/目录
  • 还会有一个快捷方式/etc/bin/filebeat

通过.deb的方式安装,可能会有如下的报错:

1
N: Download is performed unsandboxed as root as file '/root/filebeat-8.6.1-amd64.deb' couldn't be accessed by user '_apt'. - pkgAcquire::Run (13: Permission denied)

解决办法是把要安装包移动到/tmp目录下再进行安装。

tar

通过.tar包的话,直接解压即可,命令如下:

1
tar -zxvf filebeat-8.6.1-linux-x86_64.tar.gz

Windows系统

在Windows上,有两种安装方式,对应两种的安装包:

  • .msiWindows MSI x86_64 (beta)
  • .zipWindows ZIP x86_64

通过.msi的方式进行安装:

  • 应用位于C:\ProgramData\Elastic\Beats\filebeat
  • 配置文件位于C:\ProgramData\Elastic\Beats\filebeat

通过.zip的方式进行安装,和我们上文.tar方式,没有区别,直接解压即可。

快速开始

我们以《Linux操作系统使用入门:2.命令》的"后台运行"这一小节中的Java程序为例,将其日志再输出到File。

配置

filebeat.yml

配置文件filebeat.yml的地址:

1
/usr/local/filebeat/filebeat-8.6.1-linux-x86_64/filebeat.yml

我们重点关注filebeat.yml的这两部分:

  • Filebeat inputs
  • Outputs

Filebeat inputs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

# filestream is an input for collecting log messages from files.
- type: filestream

# Unique ID among all inputs, an ID is required.
id: my-filestream-id

# Change to true to enable this input configuration.
enabled: false

# Paths that should be crawled and fetched. Glob based paths.
paths:
- /var/log/*.log
#- c:\programdata\elasticsearch\logs\*

# Exclude lines. A list of regular expressions to match. It drops the lines that are
# matching any regular expression from the list.
# Line filtering happens after the parsers pipeline. If you would like to filter lines
# before parsers, use include_message parser.
#exclude_lines: ['^DBG']

# Include lines. A list of regular expressions to match. It exports the lines that are
# matching any regular expression from the list.
# Line filtering happens after the parsers pipeline. If you would like to filter lines
# before parsers, use include_message parser.
#include_lines: ['^ERR', '^WARN']

# Exclude files. A list of regular expressions to match. Filebeat drops the files that
# are matching any regular expression from the list. By default, no files are dropped.
#prospector.scanner.exclude_files: ['.gz$']

# Optional additional fields. These fields can be freely picked
# to add additional information to the crawled log files for filtering
#fields:
# level: debug
# review: 1

我们修改两处:

  • 把原本的enabled: false设为enabled: true,即打开。

  • /var/log/*.log注释掉,改成我们需要监控的日志/root/f/f.log

  • Filebeat所支持的输入,远不止这些,根据官方文档的记录,有:AWS CloudWatchAWS S3Azure Event HubAzure Blob StorageCELCloud FoundryCometDContainerfilestreamGCP Pub/SubHTTP EndpointHTTP JSONjournaldKafkaLog (deprecated in 7.16.0, use filestream)MQTTNetFlowOffice 365 Management Activity APIRedisStdinSyslogTCPUDPGoogle Cloud Storage
    我们使用最多的是filestream

Outputs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["localhost:9200"]

# Protocol - either `http` (default) or `https`.
#protocol: "https"

# Authentication credentials - either API key or username/password.
#api_key: "id:api_key"
#username: "elastic"
#password: "changeme"

# ------------------------------ Logstash Output -------------------------------
#output.logstash:
# The Logstash hosts
#hosts: ["localhost:5044"]

# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"

# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"

接下里,我们在Outputs部分新增如下内容:

1
2
3
4
output.file:
path: "/root/fb"
filename: filebeat
#rotate_every_kb: 10000
  • 有一个属性enabled,默认值是true

我们需要注释掉默认开启的Elastisearch,或者为其新增一个属性enabled,设置为false

1
2
3
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["localhost:9200"]

Outputs部分,官方的示例只有Elasticsearch OutputLogstash Output
实际上,Filebeat支持的远不止这些,根据官方文档的记录,Filebeat支持的有:

  • Elasticsearch Service
  • Elasticsearch
  • Logstash
  • Kafka
  • Redis
  • File
  • Console
  • Change the output codec

解释说明:

  • 通常我们认为的几个消息队列中,Filebeat只支持Kafka。Redis,虽然也可以作为消息队列,但通常认为是一种缓存技术。
  • Elasticsearch Service,是一些云服务厂商以软件即服务模式(SaaS)提供托管Elasticsearch。

启动

执行filebeat-8.6.1-linux-x86_64目录下的filebeat

1
./filebeat

然后,我们会看到如下内容:

1
2
3
4
5
6
7
8

【部分运行结果略】

{"@timestamp":"2023-02-07T12:01:33.196Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.6.1"},"log":{"file":{"path":"/root/f/f.log"},"offset":22593},"message":"2023-02-07T12:01:32.548Z","input":{"type":"filestream"},"host":{"os":{"name":"Ubuntu","kernel":"5.15.0-58-generic","codename":"jammy","type":"linux","platform":"ubuntu","version":"22.04.1 LTS (Jammy Jellyfish)","family":"debian"},"id":"96cdead708f76e49817e40c345eaf098","containerized":false,"name":"kaka-Parallels-Virtual-Platform","ip":["10.211.55.19","fdb2:2c26:f4e4:0:de8b:9ad8:6982:65d7","fdb2:2c26:f4e4:0:3d0f:cb3f:c8f3:6fee","fe80::c344:f7d4:956e:be78"],"mac":["00-1C-42-79-16-1E"],"hostname":"kaka-Parallels-Virtual-Platform","architecture":"x86_64"},"agent":{"id":"3ec9bcd9-e9ef-4334-84f8-07490b101660","name":"kaka-Parallels-Virtual-Platform","type":"filebeat","version":"8.6.1","ephemeral_id":"ffd132f9-a32f-4ef9-a754-24b23368d535"},"ecs":{"version":"8.0.0"}}
{"@timestamp":"2023-02-07T12:01:35.204Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.6.1"},"message":"2023-02-07T12:01:33.578Z","input":{"type":"filestream"},"ecs":{"version":"8.0.0"},"host":{"hostname":"kaka-Parallels-Virtual-Platform","architecture":"x86_64","os":{"platform":"ubuntu","version":"22.04.1 LTS (Jammy Jellyfish)","family":"debian","name":"Ubuntu","kernel":"5.15.0-58-generic","codename":"jammy","type":"linux"},"id":"96cdead708f76e49817e40c345eaf098","containerized":false,"ip":["10.211.55.19","fdb2:2c26:f4e4:0:de8b:9ad8:6982:65d7","fdb2:2c26:f4e4:0:3d0f:cb3f:c8f3:6fee","fe80::c344:f7d4:956e:be78"],"name":"kaka-Parallels-Virtual-Platform","mac":["00-1C-42-79-16-1E"]},"agent":{"version":"8.6.1","ephemeral_id":"ffd132f9-a32f-4ef9-a754-24b23368d535","id":"3ec9bcd9-e9ef-4334-84f8-07490b101660","name":"kaka-Parallels-Virtual-Platform","type":"filebeat"},"log":{"offset":22618,"file":{"path":"/root/f/f.log"}}}
{"@timestamp":"2023-02-07T12:01:35.204Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.6.1"},"log":{"file":{"path":"/root/f/f.log"},"offset":22643},"message":"2023-02-07T12:01:34.580Z","input":{"type":"filestream"},"ecs":{"version":"8.0.0"},"host":{"os":{"kernel":"5.15.0-58-generic","codename":"jammy","type":"linux","platform":"ubuntu","version":"22.04.1 LTS (Jammy Jellyfish)","family":"debian","name":"Ubuntu"},"id":"96cdead708f76e49817e40c345eaf098","name":"kaka-Parallels-Virtual-Platform","containerized":false,"ip":["10.211.55.19","fdb2:2c26:f4e4:0:de8b:9ad8:6982:65d7","fdb2:2c26:f4e4:0:3d0f:cb3f:c8f3:6fee","fe80::c344:f7d4:956e:be78"],"mac":["00-1C-42-79-16-1E"],"hostname":"kaka-Parallels-Virtual-Platform","architecture":"x86_64"},"agent":{"name":"kaka-Parallels-Virtual-Platform","type":"filebeat","version":"8.6.1","ephemeral_id":"ffd132f9-a32f-4ef9-a754-24b23368d535","id":"3ec9bcd9-e9ef-4334-84f8-07490b101660"}}

【部分运行结果略】

进行JSON格式化后,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
{
"@timestamp": "2023-02-07T12:04:59.025Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "8.6.1"
},
"message": "2023-02-07T12:04:59.025Z",
"log": {
"file": {
"path": "/root/f/f.log"
},
"offset": 22693
},
"input": {
"type": "filestream"
},
"agent": {
"version": "8.6.1",
"ephemeral_id": "ffd132f9-a32f-4ef9-a754-24b23368d535",
"id": "3ec9bcd9-e9ef-4334-84f8-07490b101660",
"name": "kaka-Parallels-Virtual-Platform",
"type": "filebeat"
},
"ecs": {
"version": "8.0.0"
},
"host": {
"os": {
"family": "debian",
"name": "Ubuntu",
"kernel": "5.15.0-58-generic",
"codename": "jammy",
"type": "linux",
"platform": "ubuntu",
"version": "22.04.1 LTS (Jammy Jellyfish)"
},
"id": "96cdead708f76e49817e40c345eaf098",
"name": "kaka-Parallels-Virtual-Platform",
"containerized": false,
"ip": ["10.211.55.19", "fdb2:2c26:f4e4:0:de8b:9ad8:6982:65d7", "fdb2:2c26:f4e4:0:3d0f:cb3f:c8f3:6fee", "fe80::c344:f7d4:956e:be78"],
"mac": ["00-1C-42-79-16-1E"],
"hostname": "kaka-Parallels-Virtual-Platform",
"architecture": "x86_64"
}
}
  • @timestamp:是当前时间戳,是Filebeat加上的字段。
  • log.file.path:采集的日志文件的路径。
  • log.offset:偏移量。
  • agent:Beats的信息。
  • 日志信息位于message字段。

对接Kafka

环境准备

  • 有两个应用,一个应用在Linux机器上输出日志,一个应用在Windows机器上输出日志;然后Filebeat采集日志,并发送到Kafka。这是Kafka的生产者。
  • 还有一个应用,接收Kakfa的消息,进行消费。这是Kafka的消费者。

在Linux机器上输出日志的应用,继续复用《Linux操作系统使用入门:2.命令》的"后台运行"这一小节中的Java程序,但有一处调整,输出的日志,在时间前面加上[Java]

在Windows机器上输出日志的,是一个Python脚本:

1
2
3
4
5
6
7
8
9
import logging
import datetime
import time

logging.basicConfig(filename='example.log', level=logging.INFO)

while True:
logging.info('[Python]' + datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f'))
time.sleep(1)

消费者是是一个SpringBoot项目,其关键代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
package com.kakawanyifan.spbf;

import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Component;

@Component
@Slf4j
public class MessageListener {
@KafkaListener(topics = "Java")
public void onJavaMessage(ConsumerRecord<String,String> record){
log.info("消费:" + record.value());
}

@KafkaListener(topics = "Python")
public void onPythonMessage(ConsumerRecord<String,String> record){
log.info("消费:" + record.value());
}
}

Filebeat配置

Filebeat配置的Filebeat inputs我们不赘述,设为指定的路径即可。主要关注Outputs

1
2
3
4
5
6
7
8
9
10
output.kafka:
hosts: ["10.211.55.14:9092"]
topic: "logs"
topics:
- topic: "Java"
when.contains:
message: "Java"
- topic: "Python"
when.contains:
message: "Python"

更详细的配置,可以参考官方文档。
地址:https://www.elastic.co/guide/en/beats/filebeat/current/kafka-output.html

测试

消费者打印的日志如下:

1
2
3
4
5
6
7
8
9
10

【部分运行结果略】

2023-02-08 20:50:09.850 INFO 2719 --- [ntainer#1-0-C-1] com.kakawanyifan.spbf.MessageListener : 消费:{"@timestamp":"2023-02-08T12:50:08.767Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.6.1"},"log":{"offset":148166,"file":{"path":"C:\\t\\example.log"}},"message":"INFO:root:[Python]2023-02-08 20:50:07.334674","input":{"type":"filestream"},"ecs":{"version":"8.0.0"},"host":{"ip":["fdb2:2c26:f4e4:0:c84b:3ac5:111a:206c","fdb2:2c26:f4e4:0:7923:ede7:bdc8:bdc7","fe80::c84b:3ac5:111a:206c","10.211.55.8"],"mac":["00-1C-42-66-F6-03"],"hostname":"DESKTOP-L4CK6OM","architecture":"x86_64","os":{"version":"10.0","family":"windows","name":"Windows 10 Pro","kernel":"10.0.19041.1706 (WinBuild.160101.0800)","build":"19042.1706","type":"windows","platform":"windows"},"id":"e57521ce-25a2-442c-9ab8-4f3d6c1ec9b3","name":"DESKTOP-L4CK6OM"},"agent":{"type":"filebeat","version":"8.6.1","ephemeral_id":"b5e6cad0-9dae-4604-aa2b-76005df89a0f","id":"22c942d5-ea8c-49a2-85f4-398d05896ebb","name":"DESKTOP-L4CK6OM"}}
2023-02-08 20:50:09.852 INFO 2719 --- [ntainer#1-0-C-1] com.kakawanyifan.spbf.MessageListener : 消费:{"@timestamp":"2023-02-08T12:50:08.767Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.6.1"},"log":{"offset":148212,"file":{"path":"C:\\t\\example.log"}},"message":"INFO:root:[Python]2023-02-08 20:50:08.335702","input":{"type":"filestream"},"ecs":{"version":"8.0.0"},"host":{"name":"DESKTOP-L4CK6OM","ip":["fdb2:2c26:f4e4:0:c84b:3ac5:111a:206c","fdb2:2c26:f4e4:0:7923:ede7:bdc8:bdc7","fe80::c84b:3ac5:111a:206c","10.211.55.8"],"mac":["00-1C-42-66-F6-03"],"hostname":"DESKTOP-L4CK6OM","architecture":"x86_64","os":{"platform":"windows","version":"10.0","family":"windows","name":"Windows 10 Pro","kernel":"10.0.19041.1706 (WinBuild.160101.0800)","build":"19042.1706","type":"windows"},"id":"e57521ce-25a2-442c-9ab8-4f3d6c1ec9b3"},"agent":{"name":"DESKTOP-L4CK6OM","type":"filebeat","version":"8.6.1","ephemeral_id":"b5e6cad0-9dae-4604-aa2b-76005df89a0f","id":"22c942d5-ea8c-49a2-85f4-398d05896ebb"}}
2023-02-08 20:50:11.369 INFO 2719 --- [ntainer#0-0-C-1] com.kakawanyifan.spbf.MessageListener : 消费:{"@timestamp":"2023-02-08T12:50:10.536Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.6.1"},"input":{"type":"filestream"},"ecs":{"version":"8.0.0"},"host":{"name":"kaka-Parallels-Virtual-Platform","mac":["00-1C-42-79-16-1E"],"hostname":"kaka-Parallels-Virtual-Platform","architecture":"x86_64","os":{"codename":"jammy","type":"linux","platform":"ubuntu","version":"22.04.1 LTS (Jammy Jellyfish)","family":"debian","name":"Ubuntu","kernel":"5.15.0-58-generic"},"id":"96cdead708f76e49817e40c345eaf098","containerized":false,"ip":["10.211.55.19","fdb2:2c26:f4e4:0:cb37:d7eb:2dca:e1f6","fdb2:2c26:f4e4:0:3d0f:cb3f:c8f3:6fee","fe80::c344:f7d4:956e:be78"]},"agent":{"id":"3ec9bcd9-e9ef-4334-84f8-07490b101660","name":"kaka-Parallels-Virtual-Platform","type":"filebeat","version":"8.6.1","ephemeral_id":"c0a087ba-30eb-49bc-b383-f7f03ceeb55d"},"log":{"offset":90805,"file":{"path":"/root/f/f.log"}},"message":"[Java]2023-02-08T12:50:08.618Z"}
2023-02-08 20:50:11.372 INFO 2719 --- [ntainer#0-0-C-1] com.kakawanyifan.spbf.MessageListener : 消费:{"@timestamp":"2023-02-08T12:50:10.536Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.6.1"},"input":{"type":"filestream"},"ecs":{"version":"8.0.0"},"host":{"architecture":"x86_64","os":{"codename":"jammy","type":"linux","platform":"ubuntu","version":"22.04.1 LTS (Jammy Jellyfish)","family":"debian","name":"Ubuntu","kernel":"5.15.0-58-generic"},"id":"96cdead708f76e49817e40c345eaf098","containerized":false,"name":"kaka-Parallels-Virtual-Platform","ip":["10.211.55.19","fdb2:2c26:f4e4:0:cb37:d7eb:2dca:e1f6","fdb2:2c26:f4e4:0:3d0f:cb3f:c8f3:6fee","fe80::c344:f7d4:956e:be78"],"mac":["00-1C-42-79-16-1E"],"hostname":"kaka-Parallels-Virtual-Platform"},"agent":{"type":"filebeat","version":"8.6.1","ephemeral_id":"c0a087ba-30eb-49bc-b383-f7f03ceeb55d","id":"3ec9bcd9-e9ef-4334-84f8-07490b101660","name":"kaka-Parallels-Virtual-Platform"},"log":{"offset":90836,"file":{"path":"/root/f/f.log"}},"message":"[Java]2023-02-08T12:50:09.619Z"}
2023-02-08 20:50:11.867 INFO 2719 --- [ntainer#1-0-C-1] com.kakawanyifan.spbf.MessageListener : 消费:{"@timestamp":"2023-02-08T12:50:10.772Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.6.1"},"agent":{"name":"DESKTOP-L4CK6OM","type":"filebeat","version":"8.6.1","ephemeral_id":"b5e6cad0-9dae-4604-aa2b-76005df89a0f","id":"22c942d5-ea8c-49a2-85f4-398d05896ebb"},"ecs":{"version":"8.0.0"},"log":{"offset":148258,"file":{"path":"C:\\t\\example.log"}},"message":"INFO:root:[Python]2023-02-08 20:50:09.337150","input":{"type":"filestream"},"host":{"mac":["00-1C-42-66-F6-03"],"hostname":"DESKTOP-L4CK6OM","name":"DESKTOP-L4CK6OM","architecture":"x86_64","os":{"platform":"windows","version":"10.0","family":"windows","name":"Windows 10 Pro","kernel":"10.0.19041.1706 (WinBuild.160101.0800)","build":"19042.1706","type":"windows"},"id":"e57521ce-25a2-442c-9ab8-4f3d6c1ec9b3","ip":["fdb2:2c26:f4e4:0:c84b:3ac5:111a:206c","fdb2:2c26:f4e4:0:7923:ede7:bdc8:bdc7","fe80::c84b:3ac5:111a:206c","10.211.55.8"]}}

【部分运行结果略】

ELK

Filebeat更多的时候,会和ELK结合,架构如图所示:

ELK架构

  • ElasticSeach是一个搜索数据库。
  • Logstash是服务器端数据处理管道,能够同时从多个来源采集数据,转换数据,然后将数据发送到Elasticsearch。
  • Kibana可以让用户在Elasticsearch中使用图形和图表对数据进行可视化。
  • Filebeat在其中的作用降低被采集机器的性能,是轻量型的单一功能数据采集器。

关于ELK的具体搭建和应用,我们不讨论。根据一些资料,一个需要注意的点是:ELK之间的版本要一致(包括Filebeat)。

还有Filebeat中的模块使用,虽然只需要简单的配置,但是只有和ELK配合,才能看到效果。
所以,关于该部分,我们也不讨论。

如图,是Kafka模块,在ELK中的效果:

Kafka模块

原理

采集日志

  • Filebeat每采集一条日志文本,都会保存为JSON格式的对象,称为日志事件(event)。
  • Filebeat的主要模块:
    • input:输入端。
    • output:输出端。
    • harvester:收割机,负责采集日志。
  • Filebeat会定期扫描(scan)日志文件,如果发现其最后修改时间改变,则创建harvester去采集日志。
    • 对每个日志文件创建一个harvester,逐行读取文本,转换成日志事件,发送到输出端。
      • 每行日志文本必须以换行符分隔,最后一行也要加上换行符才能视作一行。
    • harvester开始读取时会打开文件描述符,读取结束时才关闭文件描述符。
      • 默认会一直读取到文件末尾,如果文件未更新的时长超过close_inactive,才关闭。
  • 在Filebeat采集日志文件A时,如果发生了轮换日志文件,例如将文件A重命名为B的情况(例如mv A B),Filebeat会按以下规则处理:
    • 如果没打开文件A,则以后会因为文件A不存在而采集不了。
    • 如果打开了文件A,则会继续读取到文件末尾,然后每隔backoff时间检查一次文件:
      • 如果在backoff时长之内又创建文件A(例如touch A)。则Filebeat会认为文件被重命名。
        • 默认配置了close_renamed: false,因此会既采集文件A,又采集文件B,直到因为close_inactive超时等原因才关闭文件B。
      • 如果在backoff时长之后,依然没有创建文件A。则Filebeat会认为文件被删除。
        • 默认配置了close_removed: true,因此会立即关闭文件B而不采集,而文件A又因为不存在而采集不了。

注册表

  • Filebeat会通过registry文件记录所有日志文件的当前状态信息(State)。
    • 即使只有一个日志文件被修改了,也会在registry文件中写入一次所有日志文件的当前状态。
    • registry保存在data/registry/目录下,删除该目录就会重新采集所有日志文件。

data/registry/目录的结构:

1
2
3
4
5
data/registry/filebeat/
├── 237302.json # 快照文件,使用最后一次动作的编号作为文件名
├── active.dat # 记录快照文件的路径
├── log.json # 记录日志文件的状态。该文件体积超过 10 MB 时会自动清空,并将此时所有文件的状态保存到快照文件中
└── meta.json # 记录一些元数据

log.json中的一个示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{"op":"set", "id":237302}                             // 本次动作的编号
{
"k": "filebeat::logs::native::778887-64768", // key ,由 beat 类型、日志文件的 id 组成
"v": {
"id": "native::778887-64768", // 日志文件的 id ,由 identifier_name、inode、device 组成
"prev_id": "",
"ttl": -1, // -1 表示永不失效
"type": "log",
"source": "/var/log/supervisor/supervisord.log", // 日志文件的路径(文件被重命名之后,并不会更新该参数)
"timestamp": [2061628216741, 1611303609], // 日志文件最后一次修改的 Unix 时间戳
"offset": 1343, // 当前采集的字节偏移量,表示最后一次采集的日志行的末尾位置
"identifier_name": "native", // 识别日志文件的方式,native 表示原生方式,即根据 inode 和 device 编号识别
"FileStateOS": { // 文件的状态
"inode": 778887, // 文件的 inode 编号
"device": 64768 // 文件所在的磁盘编号
}
}
}
  • 采集每个日志文件时,会记录已采集的字节偏移量(bytes offset)。
    • 每次harvester读取日志文件时,会从offset处继续采集。
    • 如果harvester发现文件体积小于已采集的offset,则认为文件被截断了,会从offset 0处重新开始读取,这可能会导致重复采集。

发送日志

  • Filebeat将采集的日志事件经过处理之后,会发送到输出端,该过程称为发布事件(publish event)。
    • event保存在内存中,不会写入磁盘。
    • 每个event只有成功发送到输出端,且收到确认接收的回复,才视作发送成功。
      • 如果发送event到输出端失败,则会自动重试。直到发送成功,才更新记录。
      • 因此,采集到的event至少会被发送一次。但如果在确认接收之前重启Filebeat,则可能重复发送。

event的具体内容,就是我们上文快速开始的案例中,写在文件中的内容。

文章作者: Kaka Wan Yifan
文章链接: https://kakawanyifan.com/19907
版权声明: 本博客所有文章版权为文章作者所有,未经书面许可,任何机构和个人不得以任何形式转载、摘编或复制。

评论区