Multiplexing filehandles with select() in perl.
The problem
I/O requests such as read() and write() are blocking requests. Suppose you have a line in a program that get
STDIN from a terminal like the following:
$input = <STDIN>;
What will happen here is that the program's execution will block until there a line of input is available, i.e. the
user types something followed by a newline. In many cases this is the desired behavior. Suppose you have a
program that accepts requests through a socket and does some processing for each request, then moves on
to the next request.
|
|
01 # Create the receiving socket
02 my $s = new IO::Socket (
03 LocalHost => thekla,
04 LocalPort => 7070,
05 Proto => 'tcp'
06 Listen => 16,
07 Reuse => 1,
08 );
09 die "Could not create socket: $!\n" unless $s;
10
11 my ($ns, $buf);
12 while( $ns = $s->accept() ) { # wait for and accept a connection
13 while( defined( $buf = <$ns> ) ) { # read from the socket
14 # do some processing
15 }
16 }
17 close($s);
Although this is a perfectly valid way of handling the incoming requests, it does suffer some serious problems,
especially if the frequency of incoming requests is high and the processing that needs to be performed for
each is a lot.
Clearly, the problem is that, once a request has been accepted, we have to keep other requests hanging in
the queue while we read the request message and process it. Now, reading from a socket is a blocking call,
so if the client takes too long to transmit the request message, we just sit there waiting while we could be
doing useful processing of other requests. Obviously, not only this is not acceptable, but in cases where the
demand for request processing is high, the program may not be able to meet its operating reqiurements. Also
think that a single client failure at a critical point (in the middle of an ongoing transmission) poses the risk of
making the server block indefinetly.
What can we do about it?
What we need to deal with situations like the above, is a way to handle I/O (we use sockets for this example,
but the rules apply in general to any kind of filehandles) independently and with some sort of apparent
parallelism/multiprocessing. There are two very common approaches to deal with this.
One approach is to spawn separate threads of control to handle each request. This can be done either at
process-level, using fork() to create a new process for each request, or at thread-level using perl's
threading capabilities to create multiple threads within the same process. (Perl's support for threads was
introduced in version 5.005)
The other approach - which is the one that we will discuss here - is to use the select() to multiplex
between several filehandles within a single thread of control, thus creating the effect of parallelism in the
handling of I/O.
What does select() do?
The idea behind select() is to avoid blocking calls by making sure that a call will not block before
attempting it. How do we do that? Suppose we have two filehandles, and we want to read data from them as it
comes in. Let's call them A and B. Now, let's assume that A has no input pending yet, but B is ready to
respond to a read() call. If we know this bit of information, we can try readin from B first, instead of A,
knowing that our call will not block. select() gives us this bit of information. All we need to do is to define
sets of filehandles (one for reading, one for writing and one for errors) and ask call select() on them which
will return a filehandle which is ready to perform the operation for which it has been delegated (depending on
which set it is in) as soon as such a filhandle is ready.
Obviously this provides us with the advantage of always picking up a filehandle that will not block thus
avoiding the possibility of delaying the entire program for one lazy filehandle just because it happened to be
the first we picked at random. Still, it does not guarantee that the selected filehandle is the best choice,
because we still don't know how much data can be read, or how qucikly it can take in data that we wrte to it.
But it is definetly a big step forward from our initial program.
Using select()
We will try writing the example program we attempted on the beginnign of this article, but now using the
select() method. Instead of using perl's select call directly we will use a wrapper module, IO::Select
that makes life easier for us.
... create socket as before ...
11 use IO::Select;
12 $read_set = new IO::Select(); # create handle set for reading
13 $read_set->add($s); # add the main socket to the set
14
15 while (1) { # forever
16 # get a set of readable handles (blocks until at least one handle is ready)
17 my ($rh_set) = IO::Select->select($read_set, undef, undef, 0);
18 # take all readable handles in turn
19 foreach $rh (@$rh_set) {
20 # if it is the main socket then we have an incoming connection and
21 # we should accept() it and then add the new socket to the $read_set
22 if ($rh == $s) {
23 $ns = $rh->accept();
24 $read_set->add($ns);
25 }
26 # otherwise it is an ordinary socket and we should read and process the request
27 else {
28 $buf = <$rh>;
29 if($buf) { # we get normal input
30 # ... process $buf ...
31 }
32 else { # the client has closed the socket
33 # remove the socket from the $read_set and close it
34 $read_set->remove($rh);
35 close($rh);
36 }
37 }
38 }
39 }
We create an IO::Select object, $read_set, which is our set of handles to test for readability, and add all
open handles to it. We start by adding the main socket and each time a new connection is made returning a
new socket for it, we add that socket to the set. Then we go into a loop where we ask select to give us a list
of readable handles and we examine each one in turn. If it is the main socket then we want to call accept()
to receive the incoming connection and add the new socket to the read set. Otherwise it must be an ordinary
socket in which case we read from it and process its input. If the read fails, that means the socket has been
closed on the client side, so we close it, too, and remove it from the read set. So we work our way
continuously through the incoming requests, by making sure that a call for I/O on any filehandle will progress
since select() tells us it will.
As we already mentioned earlier, this method does not guarantee progress as it only tests whether a handle is
ready to respond to I/O. The question still remains, whether the handle we pick from the ready ones is the one
that will respond faster to I/O, and how much data there is available for reading or how much data it is ready
to receive. So it is still possible to block a bit after the point where we picked the handle. Also, we did not take
into account the impact on performance that the actual processing of requests will have. We might just be
printing incoming data to a file, but then again, each request might need heavy processing that would slow
down the entire handle processing loop. But these are issues that must be considered in the context of the
individual application.
Comments
|
David | Posted at 11:33pm on Wednesday, June 27th, 2007 | Excellent tutorial, thank you! |
max | Posted at 6:45pm on Friday, July 13th, 2007 | great read |
Vijay | Posted at 8:20am on Wednesday, July 18th, 2007 | A very good tuturial, very clearly explained. Thank you. |
Anis | Posted at 4:10am on Friday, July 20th, 2007 | Excellent thanks.. |
Shahid Khan | Posted at 1:06am on Thursday, October 18th, 2007 | SO useful information |
Jonathan Perkin | Posted at 6:09am on Tuesday, January 29th, 2008 | The last argument to select() should really be undef, so that it blocks until ready. A timeout of 0 means continuously check, so it chews up 100% CPU. |
Wilko | Posted at 1:24pm on Monday, February 4th, 2008 | Very good article thank you. Thanks to Jonathan P aswell! I was maxxing out the CPU whilst the server was waiting for incoming connections. Changing the 0 to undef worked a treat. Thanks again |
DimeCadmium | Posted at 1:02pm on Tuesday, March 4th, 2008 | A better way (IMO) to get the error message (more details): $@
Also, I use:
new IO::Socket::INET(...) or die "No socket: $!/$@n"; |
alpha | Posted at 10:56pm on Friday, March 14th, 2008 | You should not mix buffered input, i.e. , with select. Use {select/sysread/syswrite} or {print//read/write} |
He Man | Posted at 4:42am on Wednesday, March 19th, 2008 | NICE.... |
Rick | Posted at 4:49pm on Friday, April 25th, 2008 | I believe line 28 is a blocking IO statement. If the other end of the connection went away during an IO, this entire app will wait on that line. I tested this via Telnet as the other end - when I type my first character, the IO::Select detects it and then it blocks at line 28 until I hit carriage return. Does anyone have a solution to this issue? |
MattCarter | Posted at 11:39am on Wednesday, July 30th, 2008 | As alpha pointed out, the IO::Select example above has a serious flaw: The diamond operator () (the shortcut for readline()) does buffered I/O. The buffer that perl uses for the diamond operator is NOT visible to IO::Select. So, the above code will hang in the subsequent select call if multiple lines arrive simultaneously. To avoid this problem, the perl program must use unbuffered IO calls like sysread(...) . |
Ankit Kapoor | Posted at 12:06pm on Sunday, November 9th, 2008 | Xcellent Tutorial! |
Comments to date: 13.
|
Suggested Reading
Advanced Perl Programming among various other very interesting subjects, dedicates a chapter to socket
programming, providing a very clear and to-the-point approach to the issue. There is a short discussion on select() and
its use to manipulate sockets. It is also a good book to have in general if you're seriously interested in perl
programming.
|