Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_tail: Use inode for key of TailWatcher when follow_inodes #4185

Closed
wants to merge 1 commit into from
Closed

in_tail: Use inode for key of TailWatcher when follow_inodes #4185

wants to merge 1 commit into from

Conversation

kattz-kawa
Copy link
Contributor

@kattz-kawa kattz-kawa commented May 26, 2023

Which issue(s) this PR fixes:
Partially fixes #3614 (follow_inode true case)

What this PR does / why we need it:
Thank you for reading to the great developer's!
This PR is fixed for unexpected file close after logs rotate in fluentd v1.16.1.

I found the #3614 's reproducer.
Before applying this patch, fluentd cause unexpected file close after logs rotate every hour on my reproducer.
After applying this patch, fluentd does not cause it on my reproducer.
I have been running testing a long time.

Docs Changes:

Release Note:

@ashie
Copy link
Member

ashie commented May 26, 2023

Thanks for your contribution!

DCO check is failed: https://github.com/fluent/fluentd/pull/4185/checks?check_run_id=13780778605

Commit sha: 8f44700, Author: KatuyaKawakami, Committer: KatuyaKawakami; Can not find "KatuyaKawakami [email protected]", in ["Katuya Kawakami [email protected]", "Masaki Hatada [email protected]"].

Could you fix your name in the Author in your commit message?

- KatuyaKawakami
+ Katuya Kawakami

@ashie
Copy link
Member

ashie commented May 26, 2023

Could you fix your name in the Author in your commit message?

It's resolved, thanks!

@ashie
Copy link
Member

ashie commented May 26, 2023

Fix #3614

Could you add partially or something similar to this line in your commit message & the first comment of this PR?
As I described in #3614, it doesn't seem fix the issue on follow_inode false case.

@ashie
Copy link
Member

ashie commented May 26, 2023

It seems that CI stalls on all platforms.

@masaki-hatada
Copy link
Contributor

Fix #3614

Could you add partially or something similar to this line in your commit message & the first comment of this PR? As I described in #3614, it doesn't seem fix the issue on follow_inode false case.

Thank you for commenting.
Katsuya will handle your comment next week.

By the way, please let me explain why we didn't take care of follow_inode false case.

To fix this unexpected file close issue, we have to modify the tail plugin to create a tail list with hash values generated by inode numbers.
But users who use follow_inode false might want the plugin to create a tail list with hash values generated by file names.
That's why we hesitated to take care of follow_inode false case in this commit.

In our understanding, follow_inode false is just for keeping the compatibility with old verions of the tail plugin.
https://docs.fluentd.org/input/tail#follow_inodes explains that follow_inode false can cause "read rotated files duplicately" problem.
So every user should use follow_inode true unless there is some particular reason.

Please let me know if our understanding is wrong.
We can change our code if needed.

@ashie
Copy link
Member

ashie commented May 30, 2023

It seems that CI stalls on all platforms.

$ bundle install
$ bundle exec rake test TEST=test/plugin/test_in_tail.rb TESTOPTS="-v"
...
TailInputTest: 
  test_EACCES:																.: (0.513357)
...
  group rules line limit resolution: 
    test: valid:															.: (0.003195)
  inode_processing: 
    test_should_close_watcher_after_rotate_wait:											

The cause is test_should_close_watcher_after_rotate_wait doesn't follow the in_tail's modification.
In addition, your patch seems to be missing a fix.

diff --git a/lib/fluent/plugin/in_tail.rb b/lib/fluent/plugin/in_tail.rb
index 71d1be9b..27ee48da 100644
--- a/lib/fluent/plugin/in_tail.rb
+++ b/lib/fluent/plugin/in_tail.rb
@@ -443,7 +443,11 @@ module Fluent::Plugin
         return
       end
 
-      @tails[path] = tw
+      if @follow_inodes
+        @tails[target_info.ino] = tw
+      else
+        @tails[path] = tw
+      end
       tw.on_notify
     end
 
diff --git a/test/plugin/test_in_tail.rb b/test/plugin/test_in_tail.rb
index b1e28fc9..37fbf950 100644
--- a/test/plugin/test_in_tail.rb
+++ b/test/plugin/test_in_tail.rb
@@ -2156,13 +2156,13 @@ class TailInputTest < Test::Unit::TestCase
       target_info = create_target_info("#{@tmp_dir}/tail.txt")
       mock.proxy(Fluent::Plugin::TailInput::TailWatcher).new(target_info, anything, anything, true, true, anything, nil, anything, anything).once
       d.run(shutdown: false)
-      assert d.instance.instance_variable_get(:@tails)[target_info.path]
+      assert d.instance.instance_variable_get(:@tails)[target_info.ino]
 
       Timecop.travel(now + 10) do
         d.instance.instance_eval do
-          sleep 0.1 until @tails[target_info.path] == nil
+          sleep 0.1 until @tails[target_info.ino] == nil
         end
-        assert_nil d.instance.instance_variable_get(:@tails)[target_info.path]
+        assert_nil d.instance.instance_variable_get(:@tails)[target_info.ino]
       end
       d.instance_shutdown
     end

Even though with above fix, some more other tests are failed.
We need to fix these tests too.

@kattz-kawa
Copy link
Contributor Author

@ashie

Thank you for your comment and analysis.
I'm digging it.

@ashie
Copy link
Member

ashie commented May 30, 2023

In our understanding, follow_inode false is just for keeping the compatibility with old verions of the tail plugin.

Basically your thought is reasonable, we should make follow_inode true default when it's proofed enough. But I don't think it's proofed enough. It's introduced at v1.12 and we experienced some fatal issues in v1.12 - v1.15.

By the way, please let me explain why we didn't take care of follow_inode false case.

Even though in follow_inode case unexpected stall isn't acceptable.
Although follow_inode true is intended to resolve log duplication issue originally, modification for follow_inode feature seems have regression, it introduces stall issue as in #3614. Such reports are increased after we merge this feature.

@kattz-kawa
Copy link
Contributor Author

@ashie

Could you add partially or something similar to this line in your commit message & the first comment of this PR?

Sure, I rewrite commit message and comment of this PR.

Even though with above fix, some more other tests are failed.
We need to fix these tests too.

I revised and rewrite the codes.
All tests has passed locally.
Could you review this if all test passed in the github workflow?

@daipom
Copy link
Contributor

daipom commented Jun 1, 2023

Thanks for this fix! I'm seeing this too.

@ashie
Copy link
Member

ashie commented Jun 6, 2023

As I've described in #3614 (comment), I'm now positive to merge this.
I'll recheck remaining concerns.

@daipom
Copy link
Contributor

daipom commented Jun 7, 2023

To me, this looks like a reasonable fix to address the problem of one scenario (#3614 (comment)).

Though I think @ashie has a better grasp of the concerns of this fix, my concern is this code.

@tails.delete(rotated_target_info.ino)

Why do we need this deletion?

@daipom
Copy link
Contributor

daipom commented Jun 7, 2023

@tails.delete(rotated_target_info.ino)

Why do we need this deletion?

Hmm? Perhaps I'm confusing something...

@ashie
Copy link
Member

ashie commented Jun 7, 2023

Why do we need this deletion?

Whey the key for @tails is path, old TailWatcher can be dropped just by @tails[path] = tw.
But when the key is inode, old one cannot be dropped by @tails[inode] = true, need to drop it explicitly.
rotated_target_info.ino != new_ino here.

@masaki-hatada
Copy link
Contributor

Why do we need this deletion?

Whey the key for @tails is path, old TailWatcher can be dropped just by @tails[path] = tw. But when the key is inode, old one cannot be dropped by @tails[inode] = true, need to drop it explicitly. rotated_target_info.ino != new_ino here.

Yes, ashie-san is correct.

When using path as a key of "@tail", the key would be the same between a rotated log file and a new log file.
When using inode as a key, the key would be different between two files. So we need to delete old "@tail" entry here.

@daipom
Copy link
Contributor

daipom commented Jun 7, 2023

Why do we need this deletion?

Whey the key for @tails is path, old TailWatcher can be dropped just by @tails[path] = tw. But when the key is inode, old one cannot be dropped by @tails[inode] = true, need to drop it explicitly. rotated_target_info.ino != new_ino here.

Thanks. I can imagine that, but something seems inconsistent to me...

The current implementation of managing TailWatcher by the paths may not be wrong even if follow_inodes is enabled.

For TailWatcher, the member @path and @ino are both constant.
So we have to recreate it when one of those needs to be updated.
Currently, TailWatchers are managed by linking the paths.
Basically, it has nothing to do with follow_inodes.

It should be symmetrical whether we use the paths or the inodes as the keys for @tails.
So, I think something is wrong here with the need for deletion.

@daipom
Copy link
Contributor

daipom commented Jun 7, 2023

If we use the inodes for the keys of @tails, shouldn't we recreate the watchers by the update of the paths for each inode?

Otherwise, it's not symmetrical.
(I don't think it has to be symmetrical if it doesn't seem to have any unintended effects, though)

@daipom
Copy link
Contributor

daipom commented Jun 7, 2023

Sorry, maybe I'm saying something wrong...
I will reconsider this asymmetry.

@daipom
Copy link
Contributor

daipom commented Jun 7, 2023

Now I understand the cause of this asymmetry.
It is caused by the watcher getting a new inode from the watcher's (old) path to determine the update.

@daipom
Copy link
Contributor

daipom commented Jun 7, 2023

I have still some concerns about changing the key...

Now I understand the cause of this asymmetry. It is caused by the watcher getting a new inode from the watcher's (old) path to determine the update.

Given this, it does not seem wrong to manage watchers by using the paths as the keys, regardless of follow_inodes.
The current update process of update_watcher seems not wrong to me.

It seems to me that the problem exists in refresh_watchers.

@ashie
Copy link
Member

ashie commented Jun 7, 2023

Given this, it does not seem wrong to manage watchers by using the paths as the keys, regardless of follow_inodes.

You are right, collecting TailWatchers is done by path by refresh_watcher (according to path setting in <source> section), so managing by path isn't wrong regardless of follow_inodes.

I think changing the key is just for comprehensibility of the code.

@masaki-hatada
Copy link
Contributor

Sorry, I couldn't understand what is "asymmetry".

Please let me explain my idea.

For example, let's consider the case of that fluentd watches "/var/log/0.log".
Now the log file was rotated. An old "/var/log/0.log" uses inodeA and a new "/var/log/0.log" uses inodeB.

When using a path as a key of @tails, both of old and new TailWatcher objects would be stored to the same @tails["/var/log/0.log"].
On the other hand, when using an inode as a key of @tails, the old one would be stored to @tails[inodeA] but the new one would be stored to @tails[inodeB].
So @tails[inodeA] would be no longer used after "/var/log/0.log" got rotated. That's why we remove @tails[inodeA] entry.
It doesn't mean that we dispose the old TailWatcher object here. It would be disposed by the following line.

      detach_watcher_after_rotate_wait(rotated_tw, pe.read_inode) if rotated_tw

The old TailWatcher would be disposed in the same line even in follow_inodes false.
So I think that it's symmetry.

Given this, it does not seem wrong to manage watchers by using the paths as the keys, regardless of follow_inodes.
The current update process of update_watcher seems not wrong to me.

To change the key from path to inode is the most important point of this fix.

We believe that the issue is caused by using the same @tails entry even after the rotation.
As what #4190 explains, @tails["/var/log/0.log"] might have been already updated to the new "/var/log/0.log" by refresh_watchers(). In that case, update_watcher() would close the new one wrongly.

The path is the same even after the rotation, but the inode is changed.
So a different @tails entry would be used after the rotation if we use inode as a key. We can prevent wrongly closing by that.

I think changing the key is just for comprehensibility of the code.

Our idea is to use inode as a key of all functions of in_tail.rb when follow_inodes true.
The plugin would be no longer work if we changed the key only in some of functions.

@daipom
Copy link
Contributor

daipom commented Jun 7, 2023

Thanks, everyone.

I am wondering if the key of @tails is not a fundamental part of this problem.

I'm still not sure, but I commented in #3614 (comment) on what I thought was the possible alternative solution.

@ashie
Copy link
Member

ashie commented Jun 7, 2023

To change the key from path to inode is the most important point of this fix.

We believe that the issue is caused by using the same @tails entry even after the rotation. As what #4190 explains, @tails["/var/log/0.log"] might have been already updated to the new "/var/log/0.log" by refresh_watchers(). In that case, update_watcher() would close the new one wrongly.

I think changing the key is just for comprehensibility of the code.

Our idea is to use inode as a key of all functions of in_tail.rb when follow_inodes true. The plugin would be no longer work if we changed the key only in some of functions.

Ah, sorry, just for comprehensibility of the code is misleading.
We understand what's wrong and why this PR can fix the issue.
We are talking about whether there are more ideal solution or not.

Because tail watchers are collected according to path setting in <source> section, managing by path isn't wrong ideally.
(It causes this issue in actual though).
We think there are some other ideas to resolve it without changing the key, #4191 is one of such idea.

@daipom
Copy link
Contributor

daipom commented Jun 7, 2023

We understand what's wrong and why this PR can fix the issue.
We are talking about whether there are more ideal solution or not.

Yes, I too think so.
Sorry for my misleading explanation too.

@daipom
Copy link
Contributor

daipom commented Jun 7, 2023

Sorry, I couldn't understand what is "asymmetry".

Sorry, I just wanted to say that path as the key is reasonable for the current structure, regardless of follow_inodes.
I think this PR would solve the problem, but there may be other ways.
If there is no other good way, I think this PR is a good solution at this point.

A: The current implementation:

  • Key of @tails: path
  • Update of TailWatcher is processed based on the path
    • Based on the path, it sees the change of ino.

B: This PR's implementation:

  • Key of @tails: ino
  • Update of TailWatcher is processed based on the path
    • Based on the path, it sees the change of ino.

C: State that I feel is symmetrical to A. (In this case, we don't need to worry about the deletion.)

  • Key of @tails: ino
  • Update of TailWatcher is processed based on the ino
    • Based on the ino, it sees the change of path.

The important point is that the current TailWatcher is updated based on its path by TimerTrigger or StateWatcher, and we don't want to change it.

B will certainly solve #3614 problem partially, but if we only consider the update process of TailWatcher (update_watcher), the current A is reasonable.

@daipom
Copy link
Contributor

daipom commented Jun 7, 2023

Thanks to this PR and #4191, I am starting to see problems that I didn't understand before.
Thank you very much. I would like to think about it some more.

@masaki-hatada
Copy link
Contributor

Hi @ashie san, @daipom san,

Thank you for the explanations.
Maybe I got your idea.

The important point is that the current TailWatcher is updated based on its path by TimerTrigger or StateWatcher, and we don't want to change it.

I couldn't find a code that fluentd passes path to TimerTrigger.
But at least fluentd passes path to StatWatcher.
StatWatcher is the child class of Coolio::StatWatcher. We cannot modify Coolio::StatWatcher to use inode instead of path.
I imagine that's why you guys want to keep path as the key of @tails.

But I feel that it's not weird even if we use inode as the key even though StatWatcher still uses path.
In my understanding:

  • StatWatcher is used just for detecting the update of a log file specified by path.
  • tw.on_notify() will be called if StatWatcher detected the update. It calls @rotate_handler.on_notify(stat) then check if inode got changed.
  • @on_rotate would be called if inode change was detected.

To change the key of @tails would have no impact for the above.
on_rotate() cares inode. So I feel that it's natural that @tails is also managed by inode.

We think there are some other ideas to resolve it without changing the key, #4191 is one of such idea.

Currently I haven't found any negative side-effect of #4191.
I agree with that #4191 is good if fluentd community wants a small fix.
(I also think that #4191's if @follow_inodes && tw.ino != ino might be able to be changed just to if tw.ino != ino, even though I haven't chekced it deeply)

But #4191 is a last catch fix as @garyzjq san said.

In our opinion, to use path as the key of @tails even in follow_inodes true case can reduce the maintainability of this plugin.
The difference of the behavior of follow_inodes true and follow_inodes false is big, I think.
For example:

  • The same entry of pos file is updated even after rotation in follow_inodes false
  • But a new entry is created in follow_inodes true case.

In follow_inodes false case:

/var/log/pods/aaaa/bbbb/0.log      000000000236aab7        000000000860571f <= the same entry is updated even after rotation

In follow_inodes true case:

/var/log/pods/aaaa/bbbb/0.log      ffffffffffffffff        000000000860571f
/var/log/pods/aaaa/bbbb/0.log      ffffffffffffffff        0000000008605732
/var/log/pods/aaaa/bbbb/0.log      000000000236aab7        000000000860571f <= A new entry is added after rotation

To match the behavior of @tails to the above is good to avoid confusion, I think.

Of course I can understand that a big code change isn't preferred to avoid regressions.
So It's still ok if the community doesn't choose our fix. It's a difficult decision...

@daipom
Copy link
Contributor

daipom commented Jun 29, 2023

Hmm, there is a failure case that is concerned with this 🤔

https://github.com/fluent/fluentd/actions/runs/5408475806/jobs/9827580198?pr=4185

Oh, somehow, the first TailWatcher is not detached and not unwatched.
I've tried several times locally and still can't reproduce it.

We should check if there is a possibility that the results change depending on the timing...
Anyway, some handlers not being closed is a problem...

@masaki-hatada
Copy link
Contributor

The following "path" should be changed to "key" or something for avoiding confusion. (We didn't notice this when posting the patch in past...)

@tails.keys.each do |path|
tw = @tails.delete(path)

Thank you for rebasing our patch. We will recheck this and write here if we found any issue.

@daipom
Copy link
Contributor

daipom commented Jun 29, 2023

The following "path" should be changed to "key" or something for avoiding confusion. (We didn't notice this when posting the patch in past...)

@tails.keys.each do |path|
tw = @tails.delete(path)

Thanks!
Fixed: https://github.com/fluent/fluentd/compare/719cdee2fe1235c7ef7bf69f3e708f3bf25c12ff..0915daf405ca75d9695fbd0094643b6365bfb365

Thank you for rebasing our patch. We will recheck this and write here if we found any issue.

Thanks! If you find any issue, feel free to fix it!

@daipom daipom changed the title in_tail: Fix for unexpected file close after logs rotate in update_wa… in_tail: Use inode for key of TailWatcher Jun 30, 2023
@daipom daipom changed the title in_tail: Use inode for key of TailWatcher in_tail: Use inode for key of TailWatcher when follow_inodes Jun 30, 2023
@masaki-hatada
Copy link
Contributor

Hi @daipom san, @ashie san,

Just FYI.
Currently @kattz-kawa is also running #4185 (comment) in our local env, but we haven't faced the issue.

@daipom
Copy link
Contributor

daipom commented Jul 4, 2023

Thanks for your check! I want to conclude by the end of this week.
I will consider this point: #4185 (comment)

@masaki-hatada
Copy link
Contributor

I'm also confusing why this commit can solve #4221 .
I leave a memo.

Concern1: Really is it ok if update_watcher() detaches rotated log files?

I had thought only the case that fluentd monitors only current log files.
(Because this is the configuration we use for our environment)

  path /test/fluentd/input4/test.log

Rotated log files can be detached in update_watcher() in this case.

But how about the case that fluentd monitors also rotated log files?

  path /test/fluentd/input4/test.log*

Rotated log files should continue to be monitored even after they got renamed. (e.g., test.log => test.log.1)
So my question is "Is it ok if update_watcher() detaches rotated log files in that case?".

If the answer is "No", we need to leave the detaching task to stop_watchers() called in refresh_watchers() instead of update_watcher().

Concern2: Why cannot stop_watchers() work when follow_inodes is true?

The following line mentions "When follow_inodes is true, it's not cleaned up by refresh_watcher".

# When follow_inodes is true, it's not cleaned up by refresh_watcher.

Maybe it means that stop_watchers() isn't called by refresh_watchers() when follow_inodes is true.
I considered why it isn't called.

  1. expand_paths() creates a hash list. The key will be inode when follow_inodes is true.
    hash[target_info.ino] = target_info

    => If inode-1 of a deleted file(log-A) was reused with another new log file(log-B), log-B's TailWatcher will be stored in to its hash[inode-1].
  2. existence_path() also creates a hash list. The key will be inode when follow_inodes is true.
    hash[tw.ino] = TargetInfo.new(tw.path, tw.ino)

    => log-A's TailWatcher will be stored into its hash[inode-1]
  3. Compare existence_path's hash list and expand_paths's hash list to create unwatched_hash
    unwatched_hash = existence_paths_hash.reject {|key, value| target_paths_hash.key?(key)}

    => log-A's TailWatcher won't be added to unwatched_hash since we can find the same inode-1 key from expand_paths's hash list. But the entry of expand_paths's hash list is for log-B. This is not for log-A.
    ==> Cannot call stop_watchers() for log-A.
    ==> Cannot call start_watchers() for log-B due to the same reason.

So we need to detach log-A's TailWatcher in update_watcher() instead of stop_watchers().
(refresh_watchers() becomes able to call start_watchers() for log-B after the detaching)

How can we solve the conflict between Concern1 and 2?

We can ignore Concern1 if the expected behavior of the tail plugin is that TailWatcher object should be recreated every time the log file path got changed.
However, to avoid reading the same log twice, we should tell the previous position to the new TailWatcher object.
But where is the code to do that?

Isn't the plan B of #4221 (comment) still needed even after we use inode as a key?

@daipom
Copy link
Contributor

daipom commented Jul 6, 2023

Thanks for your considering it!

Concern1: Really is it ok if update_watcher() detaches rotated log files?

I think It is OK.
Newly created TailWatchers will tail it.

  • TailWatcher-A (test.log (current))
  • Rotation
    • test.log -> test.log.1
    • new test.log
  • TailWatcher-A detaches test.log and newly creates TailWatcher-A' (test.log)
  • test.log.1 is newly found in refresh_watcher and a new TailWatcher-B (test.log.1) is created, but it doesn't tail it due to the pos file.
  • Rotation again
    • test.log.1 -> test.log.2
    • test.log -> test.log.1
    • new test.log
  • TailWatcher-A' detaches test.log and newly creates TailWatcher-A'' (test.log)
  • TailWatcher-B detaches test.log.1 and newly creates TailWatcher-B' (test.log.1)
  • test.log.2 is newly found in refresh_watcher ...
  • ...

Concern2: Why cannot stop_watchers() work when follow_inodes is true?

Your assumptions seem mostly correct.

This is because refresh_watcher cannot recognize that the inode is disappeared.

  • As you say, refresh_watcher recognize the disappeared targets by comparing @tails and the existing file lists.
  • However, update_watcher overwrites @tails[path], so the information about the old inode of the path is lost from @tails.
  • So, refresh_watcher can't find the old inode from @tails, and can't recognize that the inode is disappeared.

This happens because the key of @tails is path.
If it is inode, then that overwriting doesn't occur, and refresh_watcher can unwatch the disappeared inode by rotation.

However, to avoid reading the same log twice, we should tell the previous position to the new TailWatcher object.
But where is the code to do that?

These lines load the position from the pos file.
If the previous position is recorded, it is loaded here and passed to the new TailWatcher.

pe = @pf[target_info]

new_position_entry = @pf[new_target_info]

@masaki-hatada
Copy link
Contributor

@daipom san,

Thank you for your explanation!

Concern2: Why cannot stop_watchers() work when follow_inodes is true?

Your assumptions seem mostly correct.

This is because refresh_watcher cannot recognize that the inode is disappeared.

  • As you say, refresh_watcher recognize the disappeared targets by comparing @tails and the existing file lists.
  • However, update_watcher overwrites @tails[path], so the information about the old inode of the path is lost from @tails.
  • So, refresh_watcher can't find the old inode from @tails, and can't recognize that the inode is disappeared.

This happens because the key of @tails is path. If it is inode, then that overwriting doesn't occur, and refresh_watcher can unwatch the disappeared inode by rotation.

I was worried if @tails[inode] can be still overwritten by another log file due to inode re-use while fluentd still has a TailWatcher of the old log file.
But I noticed that it cannot happen, since inode won't be freed until fluentd closes the old file by update_watcher()^^;
So my concern was cleared.

However, to avoid reading the same log twice, we should tell the previous position to the new TailWatcher object.
But where is the code to do that?

These lines load the position from the pos file. If the previous position is recorded, it is loaded here and passed to the new TailWatcher.

pe = @pf[target_info]

new_position_entry = @pf[new_target_info]

Ok. We can refer to the previous position by using inode because we set follow_inodes to true.

My current concern is that the previous position might have been cleared by the following line.

entry.update_pos(UNWATCHED_POSITION)

The unwatch() function will be called when detach_watcher_after_rotate_wait() is called with tail_watcher.unwatched = true.
The current code sets tail_watcher.unwatched = true when follow_inodes is true.

if @follow_inodes
# When follow_inodes is true, it's not cleaned up by refresh_watcher.
# So it should be unwatched here explicitly.
tail_watcher.unwatched = true
new_position_entry = @pf[new_target_info]
# If `refresh_watcher` find the new file before, this will not be zero.
# In this case, only we have to do is detaching the current tail_watcher.
if new_position_entry.read_inode == 0
@tails[path] = setup_watcher(new_target_info, new_position_entry)
@tails[path].on_notify
end
else
@tails[path] = setup_watcher(new_target_info, pe)
@tails[path].on_notify
end
detach_watcher_after_rotate_wait(tail_watcher, pe.read_inode)

  • TailWatcher-A detaches test.log and newly creates TailWatcher-A' (test.log)
  • test.log.1 is newly found in refresh_watcher and a new TailWatcher-B (test.log.1) is created, but it doesn't tail it due to the pos file.

Isn't the entry of the pos file cleared before test.log.1 is newly found?

@daipom
Copy link
Contributor

daipom commented Jul 7, 2023

My current concern is that the previous position might have been cleared by the following line.

The unwatch() function will be called when detach_watcher_after_rotate_wait() is called with tail_watcher.unwatched = true. The current code sets tail_watcher.unwatched = true when follow_inodes is true.

  • TailWatcher-A detaches test.log and newly creates TailWatcher-A' (test.log)
  • test.log.1 is newly found in refresh_watcher and a new TailWatcher-B (test.log.1) is created, but it doesn't tail it due to the pos file.

Isn't the entry of the pos file cleared before test.log.1 is newly found?

Yes!
The PR #4221 is to fix this problem!

@daipom
Copy link
Contributor

daipom commented Jul 7, 2023

Instead of #4221, I would like to fix it as follows:

  • Change the key of @tails to inode (This PR)
  • Stop unwatching in update_watcher (Future PR)

@masaki-hatada
Copy link
Contributor

Instead of #4221, I would like to fix it as follows:

  • Change the key of @tails to inode (This PR)
  • Stop unwatching in update_watcher (Future PR)

Ok. I got that Both of PRs are required for fixing #4221 .
Thank you for your explanation!

@daipom
Copy link
Contributor

daipom commented Jul 7, 2023

Thank you too for all your help!
If you have any concerns or anything you've noticed, please let me know!

@daipom
Copy link
Contributor

daipom commented Jul 10, 2023

#4185 (comment)

I'm thinking about the correct specifications.
I think the detached TailWathcer must be surely removed from @tails.
If not removing it from @tails, then it must not be detached.

We need to add the fix.

@daipom
Copy link
Contributor

daipom commented Jul 11, 2023

Instead of #4221, I would like to fix it as follows:

  • Change the key of @tails to inode (This PR)
  • Stop unwatching in update_watcher (Future PR)

Ok. I got that Both of PRs are required for fixing #4221 . Thank you for your explanation!

Sorry, I said I wanted to stop watching in update_wathcer in the future PR, but now I think we should do it in this PR.
It is not good to separate these fixes since using inodes for the key directly makes it unnecessary.

@daipom
Copy link
Contributor

daipom commented Jul 11, 2023

I fixed the concerns. I think this is ideal behavior.
I will check the actual behavior more.

@daipom
Copy link
Contributor

daipom commented Jul 11, 2023

@daipom
Copy link
Contributor

daipom commented Jul 11, 2023

Sorry, one test is still failing...
I will fix it.

@daipom
Copy link
Contributor

daipom commented Jul 11, 2023

This is because refresh_watcher cannot recognize that the inode is disappeared.

* As you say,  `refresh_watcher` recognize the disappeared targets by comparing `@tails` and the existing file lists.

* However, `update_watcher` overwrites `@tails[path]`, so the information about the old inode of the path is lost from `@tails`.

* So, `refresh_watcher` can't find the old inode from `@tails`, and can't recognize that the inode is disappeared.

This happens because the key of @tails is path.
If it is inode, then that overwriting doesn't occur, and refresh_watcher can unwatch the disappeared inode by rotation.

I'm sorry again 😢
By using inode for the key, the overwriting of @tails does not occur anymore.
However, we remove TailWatcher from @tails, so the information of old inode is lost from @tails and refresh_watcher can not recognize the inode is disappeared after all... 😢

@tails.delete(tail_watcher.ino)

@daipom
Copy link
Contributor

daipom commented Jul 11, 2023

I believe the current logic is the ideal behavior for update_watcher if using inode for the key.
However, the problem of unwatching still remains.

@masaki-hatada
Copy link
Contributor

masaki-hatada commented Jul 11, 2023

The following is my idea, although there might be still some overlooks...

  • Use inode as a key of @tails if follow_inodes = true.
  • Always use path as a key of hash in existence_path() and expand_paths() regardless of follow_inodes is true or not.
  • Don't make update_watcher() to call detach_watcher_after_rotate_wait() if follow_inodes is true. Don't remove the TailWatcher from @tail here, too. Instead, update the path of its TailWatcher.
  • Make start_watchers() to create a new TailWatcher only when if the same inode doesn't exist in @tails if follow_inodes is true.

The base ideas of this are:

  • Don't recreate TailWatcher even if log files got rotated. Keep using the same TailWatchers until fluentd stopped monitoring them"
  • Do all tasks of adding/removing TailWatcher to/from @tails on refresh_watchers() side. What update_watcher() should do is just to update the path of TailWatcher to the latest one when the log was rotated.

This may improve the maintainability.

Note: Regardless of `follow_inodes`, both path or inode is acceptable
for the key of `tails`.
It is not wrong that the current logic uses `path` for the key.

Perhaps we don't need to use different keys depending on
`follow_inodes`. We can always use inode for the key. (Future work)

Signed-off-by: Katuya Kawakami <[email protected]>
Signed-off-by: Masaki Hatada <[email protected]>
Co-authored-by: Daijiro Fukuda <[email protected]>
@daipom
Copy link
Contributor

daipom commented Jul 11, 2023

@kattz-kawa @masaki-hatada I have pushed the commit to solve the conflicts.
After all, I think we can change the key to solve some problems in #4221.

I thought this PR could nicely solve the current problem of unwatching (#4221), but it was wrong.
The problem of unwatching should be fixed in another PR.

@daipom
Copy link
Contributor

daipom commented Jul 11, 2023

I'm very sorry for rebasing this branch again and again.
Now this PR is for simply changing the key.
I want to consider whether to include this PR in the next version while considering the remaining in_tail issues.

After all, the only change I made is to ensure that the old TailWatcher is removed from @tails. (#4185 (comment))

@tails.delete(tail_watcher.ino)

(It was done in if new_position_entry.read_inode == 0 block originally, but I moved it outside the if block)

@daipom
Copy link
Contributor

daipom commented Jul 11, 2023

@masaki-hatada @kattz-kawa I'm very sorry for troubling you all.

Thanks for your suggestion to solve the problem of unwatching: #4185 (comment)

Maybe that problem should be handled separately from this PR.
(There's nothing wrong with this PR itself.)
I'll take this idea into account and see what I can do about this problem.

@ashie ashie added the pending To be done in the future label Jul 19, 2023
@ashie
Copy link
Member

ashie commented Mar 7, 2024

For now we don't have plan changing the key from path to inode, so I close this.
In the future we may consider it again, we'll refer this pull request again at that time.
Thanks for your great help investigating this issue!

@ashie ashie closed this Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending To be done in the future
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fluentd in_tail "unreadable" file causes "following tail of <file>" to stop and no logs pushed
4 participants