Filebeat 安装使用

Filebeat 是一个日志文件托运工具，在你的服务器上安装客户端后，filebeat 会监控日志目录或者指定的日志文件，追踪读取这些文件（追踪文件的变化，不停的读），并且转发这些信息到 elasticsearch 或者 logstarsh 中存放。本文介绍 Filebeat 安装部署。

安装

# Ubuntu/Debian
sudo apt-get update && sudo apt-get install filebeat

# CentOS/RHEL
sudo yum install filebeat

# 服务管理
sudo systemctl enable filebeat
sudo systemctl start filebeat

配置

input

参考

filebeat.inputs:
- type: log
  paths:
    - "/logs/*"
  fields:
    apache: true
  tags: ["json"]
  fields_under_root: true

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /logs/*.log
  tags: ["tomcat"]
  #指定多行匹配的类型，可选值为pattern（常用，只要下面匹配的参数出现就换行）,count（次数，用于下面匹配的参数出现多少次就换行）
  multiline.type: pattern
  #指定匹配的模式，这里的'^\d{2}代表的是以两个数字开头的，例如11或者12'
  multiline.pattern: '^\d{2}'
  #下面两个参数参考官方架构图即可；https://www.elastic.co/guide/en/beats/filebeat/7.17/multiline-examples.html
  multiline.negate: true
  multiline.match: after

multiline

参考

parsers:
- multiline:
    type: pattern
    pattern: '^\['
    negate: true
    match: after

filebeat.inputs:
- type: filestream
  enabled: true
  paths:
    - /logs/*.log
  tags: ["tomcat"]
  parsers:
    - multiline:
        #指定多行匹配的类型，可选值为pattern（常用，只要下面匹配的参数出现就换行）,count（次数，用于下面匹配的参数出现多少次就换行）
        type: pattern
        #指定匹配的模式，这里的'^\d{2}代表的是以两个数字开头的，例如11或者12'
        pattern: '^\d{2}'
        #下面两个参数参考官方架构图即可；https://www.elastic.co/guide/en/beats/filebeat/7.17/multiline-examples.html
        negate: true
        match: after

output.elasticsearch:
  enabled: true
  hosts: ["http://192.168.1.1:9200","http://192.168.1.2:9200","http://192.168.1.3:9200"]
  index: "tomcat-error-%{+yyyy.MM.dd}"

output

参考

output.elasticsearch:
  hosts: ["node-1:9200"]
  indices:
    - index: "cisco-beat-%{+yyyy.MM}"
      when.contains:
        event.module: "cisco"
  protocol: "https"
  ssl.certificate: "/etc/filebeat/filebeat/filebeat.crt"
  ssl.key: "/etc/filebeat/filebeat/filebeat.key"
  ssl.certificate_authorities:
    - /etc/filebeat/ca/ca.crt
  username: "elastic"
  password: "***********"

使用

filebeat.yml 是 Filebeat 的核心配置文件，采用 YAML 语法。它的主要作用是定义 从哪里读取数据（Inputs） 以及 将数据发送到哪里（Outputs），中间还可以包含一些数据处理（Processors）和元数据设置。

输入部分 (filebeat.inputs)

这是配置日志采集源的地方。Filebeat 支持多种输入类型，最常用的是 log (旧版) 或 filestream (新版推荐)。

type: 输入类型，常用 log 或 filestream。
enabled: 是否启用该输入，必须设置为 true。
paths: 日志文件的绝对路径列表，支持通配符（如 /var/log/*.log）。
tags: 给采集的数据打标签，方便后续在 Logstash 或 ES 中过滤。
fields: 添加自定义字段（例如 env: production）。
multiline: (非常重要) 多行合并配置，用于处理 Java 堆栈异常等跨行日志。
- pattern: 正则表达式，匹配行首。
- negate: 是否取反。
- match: 如何匹配（after/before）。
ignore_older: 忽略多久之前的日志文件（如 24h），防止读取过旧的历史数据。

处理器部分 (processors)

在数据发送前进行简单的过滤或增强。

drop_fields: 删除不需要的字段。
add_host_metadata: 添加主机信息（IP、主机名等）。
add_cloud_metadata: 如果在云上，添加云服务商元数据。

输出部分 (output)

定义数据的去向。注意：Filebeat 运行实例中只能开启一种 Output。

output.elasticsearch: 直接发给 ES。
output.logstash: 发给 Logstash 进行深度解析。
output.kafka: 发给 Kafka 消息队列（削峰填谷）。
- output.kafka.topics 参数：是一个基于条件（Condition）的路由列表。允许根据日志的内容（字段值、Tag 等）将日志分发到不同的 Kafka Topic 中。
- topic 和 topics 两个参数可以共存。topics 的优先级高于 topic。Filebeat 会优先遍历 topics 列表中的规则；如果某条日志匹配了 topics 中的规则，则发送到该规则指定的 Topic；如果所有规则都不匹配，则会回退（Fallback）发送到 topic 参数指定的默认 Topic。
output.console: 打印到终端（调试用）。

demo1

filebeat.inputs:
- type: log
  #是否启用当前的输入类型，默认值为true
  enabled: true
  #指定数据路径
  paths:
    - /tmp/test.log
    - /tmp/*.txt
  #给当前的输入类型打上标签
  tags: ["linux","DBA运维"]
  #自定义字段
  fields:
    school: "abc"
    class: "linux"

- type: log
  enabled: true
  paths:
    - /tmp/test/*/*.log
  tags: ["linux","abc"]
  fields:
    name: "oldboy"
    hobby: "linux,抖音"
  #将自定义字段的key-value放到顶级字段，
  #默认值为false，会将数据放在一个叫fields的字段下面；
  fields_under_root: true

output.console:
  pretty: true

demo2

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /tmp/test.log
    - /tmp/*.txt
  tags: ["linux","DBA运维"]
  fields:
    school: "abc"
    class: "linux"

- type: log
  enabled: true
  paths:
    - /logs/*.log
  tags: ["linux","abc"]
  fields:
    name: "oldboy"
    hobby: "linux,抖音"
  fields_under_root: true

output.elasticsearch:
  enabled: true
  hosts: ["http://192.168.1.1:9200","http://192.168.1.2:9200","http://192.168.1.3:9200"]
  indices:
    - index: "linux-elk-%{+yyyy.MM.dd}"
      #匹配指定字段包含的内容
      when.contains:
        tags: "容器运维"
    - index: "linux-python-%{+yyyy.MM.dd}"
      when.contains:
        tags: "abc"

filebeat.inputs:
- type: filestream
  id: mysql-general-log
  enabled: true
  paths:
    - /var/log/mysql/general.log
  # 处理多行日志（如跨行 SQL 语句）
  multiline:
    pattern: '^[0-9]{6} [0-9]{2}:[0-9]{2}:[0-9]{2}'
    negate: true
    match: after
  fields:
    source: mysql
    log_type: general
  fields_under_root: true

output.elasticsearch:
  hosts: ["http://elasticsearch:9200"]
  indices:
    - index: "mysql-general-%{+yyyy.MM.dd}"
  username: "elastic"
  password: "your-password"

output.logstash:
  hosts: ["logstash:5044"]

output 示例

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/nginx/*.log
  fields:
    type: nginx_access

- type: log
  enabled: true
  paths:
    - /var/log/java/*.log
  fields:
    type: java_app

output.kafka:
  hosts: ["kafka1:9092", "kafka2:9092"]

  # 【默认兜底 Topic】
  # 当下面的 topics 规则都不匹配时，日志会去这里
  topic: 'topic-default-logs'

  # 【条件路由 Topics】
  # 注意：topics 是一个列表，要注意缩进和横杠 "-"
  topics:
    # 规则 1: Nginx 日志
    - topic: "topic-nginx"
      when:
        equals:
          fields.type: "nginx_access"

    # 规则 2: Java 且 包含 ERROR 关键字
    - topic: "topic-java-error"
      when:
        and:
          - equals:
              fields.type: "java_app"
          - contains:
              message: "ERROR"

mysql

参考

$ filebeat modules enable mysql

# 然后修改文件

$ cat /etc/filebeat/modules.d/mysql.yml

- module: mysql
  error:
  enabled: true
  var.paths: ["/path/to/log/mysql/error.log*"]
  slowlog:
  enabled: true
  var.paths: ["/path/to/log/mysql/mysql-slow.log*"]

采集普通日志直接输出到 Elasticsearch

这个场景适用于中小型架构，不需要 Kafka 缓冲，也不需要 Logstash 进行极其复杂的清洗。

    
      
      
    
    yaml
  
############### Filebeat Configuration Example #################

# 1. 输入配置
filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /var/log/nginx/*.log
      - /var/log/myapp/app.log

    # 自定义字段，方便在 Kibana 中筛选
    fields:
      service_name: my-app-service
      env: production
    fields_under_root: true # 让自定义字段位于根节点，而不是 fields.service_name

    # (可选) 多行处理：例如 Java 日志，行首是日期，如果不匹配日期则合并到上一行
    # multiline.pattern: '^\d{4}-\d{2}-\d{2}'
    # multiline.negate: true
    # multiline.match: after

# 2. 处理器配置
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
  # 丢弃不需要的字段以节省空间
  - drop_fields:
      fields: ['agent.ephemeral_id', 'agent.type', 'agent.version']

# 3. 输出配置：Elasticsearch
output.elasticsearch:
  # ES 集群地址
  hosts: ['192.168.1.101:9200', '192.168.1.102:9200']

  # 认证信息（如果 ES 开启了安全认证）
  username: 'elastic'
  password: 'your_password'

  # (可选) 自定义索引名称。默认是 filebeat-%{[agent.version]}-%{+yyyy.MM.dd}
  # 注意：修改索引名通常需要配置 setup.template.name 和 pattern
  index: 'myapp-logs-%{+yyyy.MM.dd}'

# (可选) 必须配合自定义索引名使用，防止写入时因为模板匹配不上报错
setup.template.name: 'myapp-logs'
setup.template.pattern: 'myapp-logs-*'
setup.ilm.enabled: false # 如果不使用索引生命周期管理，建议关闭

# 4. Filebeat 自身日志配置 (用于排查 Filebeat 报错)
logging.level: info
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7

采集普通日志输出到 Kafka

这个场景适用于高并发、大流量架构。Logstash 或其他消费者后续会从 Kafka 读取数据。

    
      
      
    
    yaml
  
############### Filebeat Configuration Example #################

# 1. 输入配置
filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /data/logs/business/*.log

    # 给不同日志打标签，方便发往 Kafka 不同 Topic 或在 Logstash 区分处理
    tags: ['business-log', 'json-format']

# 2. 输出配置：Kafka
output.kafka:
  # Kafka Broker 地址列表
  hosts: ['kafka1:9092', 'kafka2:9092', 'kafka3:9092']

  # 指定 Topic
  # 方式 A: 固定 Topic
  topic: 'logs-business-topic'

  # 方式 B: 动态 Topic (根据字段值分发)
  # topic: 'logs-%{[fields.service_name]}'

  # 负载均衡策略
  partition.round_robin:
    reachable_only: false

  # 确认机制：0=不等待, 1=等待Leader确认, -1(all)=等待所有副本确认(最安全但最慢)
  required_acks: 1

  # 压缩模式 (gzip, snappy, lz4, zstd)
  compression: gzip

  # 单个请求最大字节数 (默认 1MB，如果日志很大建议调大)
  max_message_bytes: 1000000

# 3. 处理器配置 (可选)
processors:
  - add_host_metadata:
      netinfo.enabled: true

# 4. 自身日志
logging.level: warning

排错

重启系统导致无法启动问题

问题描述：

类似 Filebeat Fails After Power Failure，在断电或者版本升级之后都可能概率性触发，root couse 是在异常情况下 registry 文件没有 EOF 标识符。这种情况下，需要删除 registry 文件才可以启动成功。

[root@xiexianbin_cn ~]# systemctl status filebeat.service
● filebeat.service - filebeat
   Loaded: loaded (/usr/lib/systemd/system/filebeat.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Fri 2018-04-18 19:51:43 CST; 1min 5s ago
     Docs: https://www.elastic.co/guide/en/beats/filebeat/current/index.html
  Process: 15760 ExecStart=/usr/share/filebeat/bin/filebeat -c /etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat (code=exited, status=1/FAILURE)
 Main PID: 15760 (code=exited, status=1/FAILURE)

Jun 22 19:51:43 xiexianbin_cn systemd[1]: Unit filebeat.service entered failed state.
Jun 22 19:51:43 xiexianbin_cn systemd[1]: filebeat.service failed.
Jun 22 19:51:43 xiexianbin_cn systemd[1]: filebeat.service holdoff time over, schedulin...t.
Jun 22 19:51:43 xiexianbin_cn systemd[1]: start request repeated too quickly for filebe...ce
Jun 22 19:51:43 xiexianbin_cn systemd[1]: Failed to start filebeat.
Jun 22 19:51:43 xiexianbin_cn systemd[1]: Unit filebeat.service entered failed state.
Jun 22 19:51:43 xiexianbin_cn systemd[1]: filebeat.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
[root@xiexianbin_cn ~]#

解决方法：

rm -rf /var/lib/filebeat/registry
systemctl reset-failed filebeat
systemctl start filebeat

filebeat 启动失败问题

错误日志：

Exiting: Could not start registrar: Error loading state: Error decoding states: EOF

解决办法：

cd /var/lib/filebeat/
rm -rf registry
systemctl reset-failed filebeat
systemctl start filebeat

filebeat: Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid xxx