Replies: 4 comments 4 replies
-
Epoll does it with a pretty dirty (IMHO) hack, something similar but for io_uring would add a good amount of overhead to every io_uring request (probably 2 spin lock/unlock pairs), I don't think it would be better / more performant than implementing it in the userspace. Not to mention additional complexity and a couple of checks in the way of those who don't care about it. There can be also some optimisations, e.g. with fixed files, but that already gives me shivers. With sockets the problem is easily solved by issuing a shutdown, I was arguing that it'd be really great to also have it for non-socket, especially if IO on those may never complete. That's one option but it requires kernel changes Not necessary a solution but a couple of thoughts below to make it more efficient.
if (unlikely(!test_bit(file->used_by_thread_bitvec, current_thread_id))) { // fast path
atomic_set_bit(file->used_by_thread_bitvec, current_thread_id);
} Another way would be to dup the file when you transfer b/w cores, so all threads will have to close it when they're done. How you transfer files? Is it done explicitly via the framework API?
|
Beta Was this translation helpful? Give feedback.
-
why wouldn't |
Beta Was this translation helpful? Give feedback.
-
As Pavel says, shutdown(2) is good for sockets, and hopefully that covers most of your use cases? It sounds like you will need some synchronization between threads in any case however? eg.
|
Beta Was this translation helpful? Give feedback.
-
also note this statement is wrong:
it adds a refcount to the underlying file. There are also complicated cases with fixed files where files can be open for a lot longer than you expected |
Beta Was this translation helpful? Give feedback.
-
Context: I have a language runtime system where we have one OS thread per core, and we run a user-space thread scheduler for lightweight threads on each core. To add support for io_uring, we would want to have one io_uring per core, so we can issue and collect I/O operations from a single OS thread.
The difficulty is with closing fds. The fds are shared across all cores: any lightweight thread on any core can issue I/O on any fd. As discussed in #932, to close an fd in a robust way, we have to cancel any outstanding I/O on the fd too (otherwise the fd doesn't actually get closed, and we could have resource leaks for lightweight threads still blocked on I/O for that fd).
The problem is, there may be outstanding I/O operations on any of the io_uring for any of the cores, not just the ring where the close operation is being issued. So either we need a shared data structure to know if operations are outstanding on other cores (and arrange to issue cancel on those cores), or we need to blindly issue a cancel on each of the cores. This quickly becomes very expensive (in terms of cross-core traffic). And it's frustrating to always pay this cost just to support sloppy applications.
Does anyone have any good ideas?
Unfortunately it isn't a solution to insist that fds are limited to use on a single core. This isn't something that can be changed in this context. The user lightweight threads can communicate with each other and pass fds (or the higher level library handles that contain an fd) between themselves. For example a classic socket accept loop would accept an fd on (a lightweght thread on) one core and then fork a lightweight thread (which could go to any core) to handle the new connection.
It's also not a robust solution to insist that the user application cancel all I/O operations before closing. Yes, that's what applications should do, but a language runtime has to cope with sloppy user applications too (e.g. by throwing exceptions to the lightweight threads still blocked on I/O on an fd that another thread closed).
It seems to me the best approach would be if iouring could have a mode where
close
on an fd would automagically cancel any poll waiters on that fd. It wouldn't be necessary to cancel all ongoing I/O, just I/O operations that can wait indefinitely. This is the way that select, poll and epoll work (epoll is what the language runtime in question currently uses). With epoll, closing an fd that is registered in an epoll set, will generate a notification for that fd with an error.As I understand it, with iouring, every I/O operation adds a refcount to the fd that its operating on, and that is what stops close from really closing files/pipes/sockets, because the refcount is not 0. But somehow select, poll and epoll manage to work without this behaviour of keeping the file alive while the poll is outstanding. Perhaps iouring could do the same, at least for operations waiting in the poll set. My guess is that if this isn't a zero cost thing, that it'd be behaviour best to request via a flag to
io_uring_setup
.Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions