小言_互联网的博客

kubernetes-filebeat自动合并docker拆分长度超过16k字节的日志

1112人阅读  评论(0)

version: filebeat 7.0.0

Docker 使用json-file驱动保存日志时,会拆分长度超过16k字节的行,如下:

{"log":"xxxx ...... xxxx","stream":"stdout","time":"2018-04-12T03:02:09.889713897Z"}
{"log":"xxxx ...... xxxx","stream":"stdout","time":"2018-04-12T03:02:09.889713897Z"}
{"log":"xxxx ...... xxxx\r\n","stream":"stdout","time":"2018-04-12T03:02:09.889713897Z"}
{"log":" \"GET /hello HTTP\"-\"\n","stream":"stdout","time":"2019-04-15T02:05:54.021985828Z"}
{"log":" \"GET / HTTP\"-\"\n","stream":"stdout","time":"2019-04-15T02:06:55.072985828Z"}

注释:日志是以\r\n或\n结束的,如果key: "log" 没有带这个换行字符,表明这一行并没结束,可以看到上面的1、2、3实际上是从一个长度超过16K的原始日志行拆分为3行(他们的time也是一样的),而最后一条日志并没拆分。

因此我们使用filebeat的input为log类型来读取日志会造成困扰,并不会自动合并被拆分的行。

filebeat.inputs:
- type: log
  paths:
    - /var/log/messages
    - /var/log/*.log

因此,我们为了解决这个问题,需要改为input:docker方式读取日志,这样会自动把超过16K字节而被拆分的行合并为一行:

filebeat.inputs:
- type: docker
  containers.ids: 
    - '8b6fe7dc9e067b58476dc57d6986dd96d7100430c5de3b109a99cd56ac655347'

下面,我们做一个测试
1. 打包一个输出16k字节以上的容器并运行容器:

# cat start.sh 
#!/bin/sh
while true; do
  key="somerandomkey_"
  value="somerandomvalue_"
  echo -n '{'
  for i in $(seq 420); do
    echo -n "\"${key}${i}\":\"${value}${i}\","
  done
  echo '"lastkey":"end"}'
  sleep 5
done
# cat Dockerfile 
FROM alpine:latest
COPY start.sh /usr/bin/start.sh
CMD start.sh
$ docker build --tag echo2:dev .
  
$ docker run --name echo2 -d echo2:dev 

检查容器的日志内容,可见,日志已经被拆分了,使用type:log 方式读出日志并不会自动合并被拆分的行:

# docker inspect echo2 |jq '.[0]["LogPath"]'
"/var/lib/docker/containers/5af7d648adb7e8c3be986f8e6fe9b84cdeac3fdb3e087beaf1b5eb8e8bb90629/5af7d648adb7e8c3be986f8e6fe9b84cdeac3fdb3e087beaf1b5eb8e8bb90629-json.log"

tail /var/lib/docker/containers/5af7d648adb7e8c3be986f8e6fe9b84cdeac3fdb3e087beaf1b5eb8e8bb90629/5af7d648adb7e8c3be986f8e6fe9b84cdeac3fdb3e087beaf1b5eb8e8bb90629-json.log

2. 查询出该容器的ID:

# docker inspect echo2 |jq '.[0]["Id"]'
"5af7d648adb7e8c3be986f8e6fe9b84cdeac3fdb3e087beaf1b5eb8e8bb90629"

3. 使用filebeat (这里用版本:7.0.0) 读取容器的日志并输出到一个文件中:

filebeat配置文件如下:

# cat docker.yaml 
filebeat.inputs:
- type: docker
  containers.ids:
    - "5af7d648adb7e8c3be986f8e6fe9b84cdeac3fdb3e087beaf1b5eb8e8bb90629"
  processors:
    - add_docker_metadata: ~

output.file:
  path: "/tmp/filebeat"
  filename: docker.log

运行filebeat:

# ./filebeat -e -c ./docker.yaml

检查输出的文件

# tail /tmp/filebeat/docker.log

可以看到,被容器拆分的行,经过filebeat读取后,输出到/tmp/filebeat/docker.log中,并自动合并好。

测试期间出现的问题:

问题一:人工干预过容器的日志,导致容器日志格式错误

  
"log":"xxxx ...... xxxx","stream":"stdout","time":"2019-04-12T03:02:09.889712897Z"}
{"log":"xxxx ...... xxxx","stream":"stdout","time":"2019-04-12T03:02:09.889712897Z"}
{"log":"xxxx ...... xxxx\r\n","stream":"stdout","time":"2019-04-12T03:02:09.889712897Z"}

上面的日志有2个问题:

1. 第一行为空行(或者是“空格”);

2. 第二行为开头缺少“{” ,是一个错误的json格式;

执行filebeat,就会出现以下的 invalid CRI log format 错误:

./filebeat -e -c ./docker.yaml

....

2019-04-15T14:37:44.457+0800	INFO	log/input.go:138	Configured paths: [/var/lib/docker/containers/aea78b5dceed672e52b7d74bad7b70727f7a016ca3281f45daec9dbb4509b749/*.log]
2019-04-15T14:37:44.457+0800	INFO	input/input.go:114	Starting input of type: docker; ID: 17536676907717185094 
2019-04-15T14:37:44.457+0800	INFO	log/harvester.go:254	Harvester started for file: /var/lib/docker/containers/aea78b5dceed672e52b7d74bad7b70727f7a016ca3281f45daec9dbb4509b749/aea78b5dceed672e52b7d74bad7b70727f7a016ca3281f45daec9dbb4509b749-json.log
2019-04-15T14:37:44.457+0800	ERROR	log/harvester.go:281	Read line error: invalid CRI log format; File: /var/lib/docker/containers/aea78b5dceed672e52b7d74bad7b70727f7a016ca3281f45daec9dbb4509b749/aea78b5dceed672e52b7d74bad7b70727f7a016ca3281f45daec9dbb4509b749-json.log

保证容器日志每行都是正确的json格式的日志,就不会有错误;

参考:

filebeat 文档:https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-docker.html

docker 日志处理超过16k字节的issue:https://github.com/moby/moby/issues/36777

filebeat 处理docker 日志超过16k字节的issue: https://github.com/elastic/beats/pull/6967

 


转载:https://blog.csdn.net/kozazyh/article/details/89262002
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场