Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tail plugin read the same file multiple times (wildecard & NFS) #4102

Closed
SML0127 opened this issue Mar 18, 2023 · 6 comments
Closed

Tail plugin read the same file multiple times (wildecard & NFS) #4102

SML0127 opened this issue Mar 18, 2023 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@SML0127
Copy link

SML0127 commented Mar 18, 2023

Describe the bug

I tested reading files from an NFS mounted server using the tail plugin and wildcard * .
New files are added to the NFS mounted server in batches every hour, and it worked for about 3 hours.

But after that, duplicate data was being read and when i checked fluentd log, the same files were being tailed multiple time.
Here is my fluentd log.

2023-03-17 20:45:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__15__hid=16.log
2023-03-17 20:45:10 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__16__hid=16.log
2023-03-17 20:45:11 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__17__hid=16.log

2023-03-17 21:45:10 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__15__hid=17.log
2023-03-17 21:45:11 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__17__hid=17.log
2023-03-17 21:45:15 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__16__hid=17.log

2023-03-17 22:45:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__15__hid=18.log
2023-03-17 22:45:10 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__16__hid=18.log
2023-03-17 22:45:11 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__17__hid=18.log

2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__15__hid=15.log
2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__15__hid=16.log
2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__15__hid=17.log
2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__15__hid=18.log

2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__16__hid=15.log
2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__16__hid=16.log
2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__16__hid=17.log
2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__16__hid=18.log

2023-03-17 22:49:06 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__17__hid=15.log
2023-03-17 22:49:06 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__17__hid=16.log
2023-03-17 22:49:06 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__17__hid=17.log
2023-03-17 22:49:06 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__17__hid=18.log

From 2023-03-17 22:49:05, the same files started to be tailed repeatedly, and new logs or data were not added to those files.

Also, positions of the same files was appended multiple times in pos_file.

/fluentd/source-data/fluentd-5__17__hid=15.log 000000002e0118a2        00000000240c8533                                                                                                                                     
/fluentd/source-data/fluentd-5__15__hid=15.log 000000002b85ade7        00000000249739b4
/fluentd/source-data/fluentd-5__16__hid=15.log 000000002dc8f4ce        000000002447bf34
/fluentd/source-data/fluentd-5__15__hid=16.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__16__hid=16.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__17__hid=16.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__15__hid=17.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__17__hid=17.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__16__hid=17.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__15__hid=18.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__16__hid=18.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__17__hid=18.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__15__hid=15.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__15__hid=16.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__15__hid=17.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__15__hid=18.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__16__hid=15.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__16__hid=16.log 0000000000000000        0000000000000000

To Reproduce

This is my fluentd-conf.yaml
/fluentd/source-data/ is NFS mounted server directory

<source>
  @type tail
  path "/fluentd/source-data/#{hostname}__*.log"
  pos_file "/fluentd/source-data/pos_files/#{hostname}.pos"
  refresh_interval 5s 
  follow_inodes true 
  skip_refresh_on_startup true
  read_from_head true
  read_lines_limit  10000 
  tag tag 
  pos_file_compaction_interval 12h
  <parse>
    @type json
  </parse>
</source>

Expected behavior

I expected that fluentd reads each file once.

Your Environment

- Fluentd version: fluentd:v1.15.3-debian-1.1
- Operating system: PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
- Kernel version: Linux 4.15.0-207-generic

Your Configuration

<source>
  @type tail
  path "/fluentd/source-data/#{hostname}__*.log"
  pos_file "/fluentd/source-data/pos_files/#{hostname}.pos"
  refresh_interval 5s 
  follow_inodes true 
  skip_refresh_on_startup true
  read_from_head true
  read_lines_limit  10000 
  tag tag 
  pos_file_compaction_interval 12h
  <parse>
    @type json
  </parse>
</source>

Your Error Log

My fluentd log

2023-03-17 20:45:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__15__hid=16.log
2023-03-17 20:45:10 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__16__hid=16.log
2023-03-17 20:45:11 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__17__hid=16.log

2023-03-17 21:45:10 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__15__hid=17.log
2023-03-17 21:45:11 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__17__hid=17.log
2023-03-17 21:45:15 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__16__hid=17.log

2023-03-17 22:45:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__15__hid=18.log
2023-03-17 22:45:10 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__16__hid=18.log
2023-03-17 22:45:11 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__17__hid=18.log

2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__15__hid=15.log
2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__15__hid=16.log
2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__15__hid=17.log
2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__15__hid=18.log

2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__16__hid=15.log
2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__16__hid=16.log
2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__16__hid=17.log
2023-03-17 22:49:05 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__16__hid=18.log

2023-03-17 22:49:06 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__17__hid=15.log
2023-03-17 22:49:06 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__17__hid=16.log
2023-03-17 22:49:06 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__17__hid=17.log
2023-03-17 22:49:06 +0900 [info]: #0 following tail of /fluentd/source-data/fluentd-5__17__hid=18.log

My .pos file

/fluentd/source-data/fluentd-5__17__hid=15.log 000000002e0118a2        00000000240c8533                                                                                                                                     
/fluentd/source-data/fluentd-5__15__hid=15.log 000000002b85ade7        00000000249739b4
/fluentd/source-data/fluentd-5__16__hid=15.log 000000002dc8f4ce        000000002447bf34
/fluentd/source-data/fluentd-5__15__hid=16.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__16__hid=16.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__17__hid=16.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__15__hid=17.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__17__hid=17.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__16__hid=17.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__15__hid=18.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__16__hid=18.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__17__hid=18.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__15__hid=15.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__15__hid=16.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__15__hid=17.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__15__hid=18.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__16__hid=15.log 0000000000000000        0000000000000000
/fluentd/source-data/fluentd-5__16__hid=16.log 0000000000000000        0000000000000000

Additional context

No response

@daipom
Copy link
Contributor

daipom commented Mar 28, 2023

@SML0127
Thanks for reporting and sorry for the delay.

/fluentd/source-data/fluentd-5__15__hid=16.log 0000000000000000 0000000000000000

It is strange that both inode and pos are recorded as 0.
I'll look into possible causes.

Were there any other logs during this time?
For example detected rotation of ....
Especially, I'm wondering what happened in the four minutes between 22:45:11 and 22:49:05.

If possible, I would appreciate it if you could check if this also occurs on the local file system, not only on NFS.

@daipom daipom self-assigned this Mar 28, 2023
@SML0127
Copy link
Author

SML0127 commented Mar 31, 2023

@daipom

Were there any other logs during this time?
For example detected rotation of ....
Especially, I'm wondering what happened in the four minutes between 22:45:11 and 22:49:05.

There are no logs during that time, except following tail of XXX.
And I did nothing (just watched 🥲) between 22:45:11 and 22:49:05.

If possible, I would appreciate it if you could check if this also occurs on the local file system, not only on NFS.

I changed to local and ran it for hours, it didn't happen.
(I'm not sure if it didn't happen because I was lucky at this time or if it didn't happen because I changed NFS to local.)

@daipom
Copy link
Contributor

daipom commented Apr 1, 2023

@SML0127

I changed to local and ran it for hours, it didn't happen.

Thanks for confirming!
Were both path and pos_file on local?

I have heard before that we should not put pos_file on NFS.
If there is no need to put the pos_file on NFS, I am wondering if this problem could be avoided by just putting the pos_file on local storage.

I will check possible causes that accidentally make both inode and pos 0.

@SML0127
Copy link
Author

SML0127 commented Apr 1, 2023

@daipom

Were both path and pos_file on local?

yes, both path and pos_file on local

I have heard before that we should not put pos_file on NFS. If there is no need to put the pos_file on NFS, I am wondering if this problem could be avoided by just putting the pos_file on local storage.

I had the same issue when pos_file on local and path on NFS.

@daipom
Copy link
Contributor

daipom commented Aug 4, 2023

@SML0127 Sorry for my late response.
Fluentd v1.16.2 has many improvements to the follow_inodes option of in_tail.
Especially, it fixes some log duplication cases:

The package or docker images will be released soon.
If this problem still occurs on that version, please reopen this issue.

@daipom daipom closed this as completed Aug 4, 2023
@daipom daipom added bug Something isn't working and removed work-in-progress labels Aug 4, 2023
@SML0127
Copy link
Author

SML0127 commented Aug 5, 2023

Thx @daipom~
I hope it works well in the next version!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants