-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in_tail: prevent wrongly unwatching with follow_inodes #4237
in_tail: prevent wrongly unwatching with follow_inodes #4237
Conversation
TODO: I'm checking if it is OK to unwatch the position tied to TailWatcher that is not detached yet. |
https://github.com/fluent/fluentd/actions/runs/5526720226/jobs/10081747103?pr=4237 The test Windows too. It may have something to do with this fix. https://github.com/fluent/fluentd/actions/runs/5526720222/jobs/10081747201?pr=4237 |
lib/fluent/plugin/in_tail.rb
Outdated
log.debug { | ||
target_paths_str = target_paths_hash.collect { |key, target_info| target_info.path }.join(",") | ||
existence_paths_str = existence_paths_hash.collect { |key, target_info| target_info.path }.join(",") | ||
"tailing paths: target = #{target_paths_str} | existing = #{existence_paths_str}" | ||
} | ||
|
||
unwatched_hash = existence_paths_hash.reject {|key, value| target_paths_hash.key?(key)} | ||
@pf&.unwatch_removed_targets(target_paths_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I might overlook something, but could you let me ask the following?
I considered the pattern that /var/log/test1.log was removed then its inode(inode-1) was reused by /var/log/test2.log in the setting path = /var/log/*.log*
.
- target_paths_hash will include inode-1, since /var/log/test2.log has the same inode as the removed /var/log/test1.log.
- So
@pf&.unwatch_removed_targets(target_paths_hash)
will skip inode-1 so the position info of inode-1 won't be reset. removed_hash
also won't include inode-1 due to the same reason.
In this case, won't fluentd misrecognize the previous position for /var/log/test2.log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for considering it!
That scenario can happen in the current master branch, right?
I think this logic should work in that scenario, and the position should be reset to zero.
I'm not sure this works in that scenario, though...
We should check it.
fluentd/lib/fluent/plugin/in_tail.rb
Lines 836 to 843 in e120693
# rotated file has the same inode number with the last file. | |
# assuming following situation: | |
# a) file was once renamed and backed, or | |
# b) symlink or hardlink to the same file is recreated | |
# in either case of a and b, seek to the saved position | |
# c) file was once renamed, truncated and then backed | |
# in this case, consider it truncated | |
@pe.update(inode, 0) if fsize < @pe.read_pos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That scenario can happen in the current master branch, right?
I wanted to say that #4237 (comment) is the side effect introduced by adding @pf&.unwatch_removed_targets(target_paths_hash)
in your patch.
But maybe I was wrong, because added_hash
also won't include inode-1 in my suggested pattern.
I rewrote the scenario a bit as follows.
In the pattern that /var/log/test1.log
was removed then its inode(inode-1) was reused by /var/log/test2.log
in the setting path = /var/log/*.log*
:
target_paths_hash
will include inode-1, since/var/log/test2.log
has the same inode as the removed/var/log/test1.log
.- It means that
@pf&.unwatch_removed_targets(target_paths_hash)
will skip inode-1 so the position info of inode-1 won't be reset. removed_hash
also won't include inode-1 due to the same reason.added_hash
also won't include inode-1 due to the same reason.
- It means that
- So
refresh_watchers()
will call neitherstart_watchers()
norstop_watchers()
for both of/var/log/test1.log
and/var/log/test2.log
.
Yes, the above scenario can happen even in the current master branch.
The situation will be resolved after a new /var/log/test1.log
was created and @tails["/var/log/test1.log"]
got updated with a new inode.
So it won't be a big problem in real environment, I believe...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it rotation? (Is the test1.log
the same file as test2.log
?)
If so, it is correct that refresh_watcher
does nothing for inode-1
because it means the target inode-1
still exists.
In this case, update_watcher
update the TailWatcher.
However, I wonder the old TailWathcer can collect duplicate log in rotate_wait
interval...
I'll check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I wonder the old TailWathcer can collect duplicate log in rotate_wait interval...
I'll check it.
This may not be the point that you were originally concerned about, but I found this can cause log duplication.
This seems to be a problem that has existed for a long time.
This is a similar problem to #4237 (comment)
It seems that in_tail has a log duplication problem since a long time ago with rotate_wait
.
I will explain this in another comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented on this in #4237 (comment)
We must not unwatch targets that still exist. If unwatching an existing target, it causes log duplication. In the existing implementation, `update_watcher` needs to unwatch the old TailWatcher since `refresh_watcher` can't unwatch it when `update_watcher` is called first. It is because `update_watcher` discards the old TailWatcher and `refresh_watcher` can't recognize the old inode is disappeared. However, it can wrongly unwatch an existing inode because the old inode may still exist. (See the diff of test cases.) Thus, we need a new mechanism to correctly unwatch targets when follow_inodes. This fix is based on the idea that we should unwatch based on directly on PositionFile's data. Signed-off-by: Daijiro Fukuda <[email protected]> Co-authored-by: Takuro Ashie <[email protected]>
d9bfe4a
to
51848da
Compare
I fixed this so that this logic works only for |
This fix changes that unwatching is done immediately (without waiting Even if the existing TailWatcher's position is unwatched, the target file is already removed. |
The following test failure is one example of this.
It is difficult to explain why this test was unstable before rebasing this branch.
This fix changes this behavior. Because of the following logic, TailWathcer can update the position to zero even after being unwatched. fluentd/lib/fluent/plugin/in_tail.rb Lines 836 to 843 in e120693
Thus, this fix can cause the log duplication in this scenario. This happens only when the target having the same inode resurrects (disappears and appears again) with the smaller file size in the interval |
About #4237 (comment), I think we should fix the entire logic so that a new TailWatcher for the same target should be added after the detach of the old TailWatcher is completed. But it doesn't look like we'll have time to get there before the next release of 1.16... 😢 |
I have commented on some cases where multiple TailWatcher exists due to So now I think we should not be concerned about that point in this PR. #4237 (comment) The scenario: rotate -> update_watcher -> refresh_watcher ->
As above, <source>
@type tail
tag test
path /path/to/test.log*
pos_file /test/fluentd/pos/pos
read_from_head true
follow_inodes true
refresh_interval 5s
enable_stat_watcher false # To ensure that TailWathcer recognizes rotation
rotate_wait 30s
<parse>
@type none
</parse>
</source>
<match test.**>
@type stdout
</match>
|
In some cases, as in the case This problem should be solved fundamentally. |
The situation that this test cares seems rare, and it doesn't necessarily have to be resolved on
There are 2 possible reasons for same
For 1, we don't always need to resolve it (at least in this PR) as I described above. For 2, it's not possible in this case because the test.rb: #!/usr/bin/env ruby
require 'fileutils'
FileUtils.rm_rf("hoge.txt")
File.open("hoge.txt", "wb") { |f| f.puts("hoge") }
s1 = File.stat("hoge.txt")
puts "ino: #{s1.ino}"
if ARGV[0] == "keep-opened"
f = File.open("hoge.txt", "wb")
end
FileUtils.rm_rf("hoge.txt")
File.open("hoge.txt", "wb") { |f| f.puts("hage") }
s2 = File.stat("hoge.txt")
puts "ino: #{s2.ino}, same-inode?: #{s1.ino == s2.ino}" $ ./test.rb
ino: 20192736
ino: 20192736, same-inode?: true
$ ./test.rb keep-opened
ino: 20192736
ino: 20197162, same-inode?: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have commented on some cases where multiple TailWatcher exists due to rotate_wait.
I have checked the behavior of v1.16.1, and I found out that in_tail has this problem since a long time ago.
Not needed to resolve all of this kine of problems by this PR.
Since we need more time to resolve it fundamentally, I think that's enough for now as it solves part of them.
Thanks for your review! I agree. |
Thanks! |
Thanks! @ashie @masaki-hatada |
Which issue(s) this PR fixes:
What this PR does / why we need it:
We must not unwatch targets that still exist.
If unwatching an existing target, it causes log duplication.
This problem exists from v1.13.3 (the following fix).
This can occur when enabling
follow_inodes
.In the existing implementation,
update_watcher
needs to unwatch the old TailWatcher sincerefresh_watcher
can't unwatch it whenupdate_watcher
is called first (when rotation happens).It is because
update_watcher
discards the old TailWatcher andrefresh_watcher
can't recognize the old inode is disappeared.fluentd/lib/fluent/plugin/in_tail.rb
Lines 504 to 507 in e120693
However, it can wrongly unwatch an existing inode because the old inode may still exist.
(See the diff of test cases.)
Thus, we need a new mechanism to unwatch targets correctly.
This fix is based on the idea that we should unwatch based on directly on PositionFile's data.
Docs Changes:
No need.
Release Note:
Same as the title.
How to Reproduce
Config:
enable_stat_watcher
enable_stat_watcher false
is for making sure to detect rotation.nil
to it'srotate_handler
. This causes some problems. (I will create an issue or PR about this later.)in_tail
can't detect the rotation of the current file because of this, then this problem does not occur. The focus is on whether the rotation of the current file can be detected whenenable_stat_watcher
is enabled. If not, then this can occur only when disablingenable_stat_watcher
.Files:
Script to rotate:
Run:
Result:
refresh_watcher
.update_watcher
wrongly unwatches the existing inode.