Bryan Cantrill
2012-07-29 05:38:03 UTC
All,
We have a bit of a thorny problem with tail -f. Currently, our tail -f
(and, it must be said, Solaris 11's) does not pick up a truncation of the
file. That is, if you're doing a tail -f of /tmp/foo, and /tmp/foo gets
truncated and subsequently written to, your tail won't print any output
until the size of /tmp/foo exceeds the size of the seek offset at the time
of truncation (and even then, you'll be missing the new contents of the
file before that point). This seems busted -- and it's not the way that
either BSD tail or GNU tail behave.
Now, we have a couple of options. GNU tail -f takes the obvious (but also
obviously flawed) approach of fstat()'ing the file every period of time
and comparing the st_size to the old st_size; if the st_size is less than
the old st_size, it (somewhat uselessly, in my opinion) prints out a
message ("tail: /tmp/foo: file truncated") and then (somewhat brokenly, in
my opinion) moves its seek offset to the new end of the file. (That is,
whatever was written to the file between truncation and the fstat() is
never printed.) Subsequent appends to the file, however, will (correctly)
result in output. Note that this approach leads to some very odd
behavior: if you're doing a tail -f of a file that has 10 bytes, and that
file is recreated (opened O_TRUNC) and then has 100 bytes written to it
immediately, you'll see only the last 90 bytes.
BSD (including OS X) tail -f takes the much better approach of using
kevent in such a way that it can determine when the file has been
truncated, adjusting its seek pointer appropriately. (It can only do this
when using the USE_PORT action -- which is why ours is busted.) This
means that BSD tail -f essentially always does the Right Thing: it will
catch file truncation and print the new contents, and if you truncate the
file and overwrite it with a size larger than the old size, it will
correctly print all of the new contents.
In terms of options for us: I think our tail -f behavior as it is
currently is pretty clearly busted, but we could easily implement the GNU
behavior (albeit without the error message and the additional semantic
oddity of not printing the new contents). However, I think that this
behavior still leaves a lot to be desired in that it gets so confused when
the size after truncation-and-write exceeds the size before truncation.
The second option is to take the BSD approach. This would take us back to
using event ports (which we ripped out as part of illumos#535), and we
would want to be mindful of the issues raised in illumos#535. In
particular, we would want to use PORT_SOURCE_FILE, not PORT_SOURCE_FD.
Even still, our existing functionality for PORT_SOURCE_FILE isn't _quite_
enough to get us there: PORT_SOURCE_FILE as it is today will tell us that
a file has been modified, but not that it's been truncated. However, it's
a very simple (and, in my opinion, reasonable) change to add a FILE_TRUNC
event (or similar) to the PORT_SOURCE_FILE events -- which would allow our
tail -f (and not to mention, any other PORT_SOURCE_FILE event port
consumer) to exactly match the BSD behavior. I think that this is my
preferred approach, but it's obviously a tad more involved.
Thoughts on any/all of this?
- Bryan
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-c42b328b
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-605db409
Powered by Listbox: http://www.listbox.com
We have a bit of a thorny problem with tail -f. Currently, our tail -f
(and, it must be said, Solaris 11's) does not pick up a truncation of the
file. That is, if you're doing a tail -f of /tmp/foo, and /tmp/foo gets
truncated and subsequently written to, your tail won't print any output
until the size of /tmp/foo exceeds the size of the seek offset at the time
of truncation (and even then, you'll be missing the new contents of the
file before that point). This seems busted -- and it's not the way that
either BSD tail or GNU tail behave.
Now, we have a couple of options. GNU tail -f takes the obvious (but also
obviously flawed) approach of fstat()'ing the file every period of time
and comparing the st_size to the old st_size; if the st_size is less than
the old st_size, it (somewhat uselessly, in my opinion) prints out a
message ("tail: /tmp/foo: file truncated") and then (somewhat brokenly, in
my opinion) moves its seek offset to the new end of the file. (That is,
whatever was written to the file between truncation and the fstat() is
never printed.) Subsequent appends to the file, however, will (correctly)
result in output. Note that this approach leads to some very odd
behavior: if you're doing a tail -f of a file that has 10 bytes, and that
file is recreated (opened O_TRUNC) and then has 100 bytes written to it
immediately, you'll see only the last 90 bytes.
BSD (including OS X) tail -f takes the much better approach of using
kevent in such a way that it can determine when the file has been
truncated, adjusting its seek pointer appropriately. (It can only do this
when using the USE_PORT action -- which is why ours is busted.) This
means that BSD tail -f essentially always does the Right Thing: it will
catch file truncation and print the new contents, and if you truncate the
file and overwrite it with a size larger than the old size, it will
correctly print all of the new contents.
In terms of options for us: I think our tail -f behavior as it is
currently is pretty clearly busted, but we could easily implement the GNU
behavior (albeit without the error message and the additional semantic
oddity of not printing the new contents). However, I think that this
behavior still leaves a lot to be desired in that it gets so confused when
the size after truncation-and-write exceeds the size before truncation.
The second option is to take the BSD approach. This would take us back to
using event ports (which we ripped out as part of illumos#535), and we
would want to be mindful of the issues raised in illumos#535. In
particular, we would want to use PORT_SOURCE_FILE, not PORT_SOURCE_FD.
Even still, our existing functionality for PORT_SOURCE_FILE isn't _quite_
enough to get us there: PORT_SOURCE_FILE as it is today will tell us that
a file has been modified, but not that it's been truncated. However, it's
a very simple (and, in my opinion, reasonable) change to add a FILE_TRUNC
event (or similar) to the PORT_SOURCE_FILE events -- which would allow our
tail -f (and not to mention, any other PORT_SOURCE_FILE event port
consumer) to exactly match the BSD behavior. I think that this is my
preferred approach, but it's obviously a tad more involved.
Thoughts on any/all of this?
- Bryan
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-c42b328b
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-605db409
Powered by Listbox: http://www.listbox.com