Perlfect Solutions
 

Multiplexing filehandles with select() in perl.

The problem

I/O requests such as read() and write() are blocking requests. Suppose you have a line in a program that get STDIN from a terminal like the following:

$input = <STDIN>;

What will happen here is that the program's execution will block until there a line of input is available, i.e. the user types something followed by a newline. In many cases this is the desired behavior. Suppose you have a program that accepts requests through a socket and does some processing for each request, then moves on to the next request.

01 # Create the receiving socket 02 my $s = new IO::Socket ( 03 LocalHost => thekla, 04 LocalPort => 7070, 05 Proto => 'tcp' 06 Listen => 16, 07 Reuse => 1, 08 ); 09 die "Could not create socket: $!\n" unless $s; 10 11 my ($ns, $buf); 12 while( $ns = $s->accept() ) { # wait for and accept a connection 13 while( defined( $buf = <$ns> ) ) { # read from the socket 14 # do some processing 15 } 16 } 17 close($s);

Although this is a perfectly valid way of handling the incoming requests, it does suffer some serious problems, especially if the frequency of incoming requests is high and the processing that needs to be performed for each is a lot.

Clearly, the problem is that, once a request has been accepted, we have to keep other requests hanging in the queue while we read the request message and process it. Now, reading from a socket is a blocking call, so if the client takes too long to transmit the request message, we just sit there waiting while we could be doing useful processing of other requests. Obviously, not only this is not acceptable, but in cases where the demand for request processing is high, the program may not be able to meet its operating reqiurements. Also think that a single client failure at a critical point (in the middle of an ongoing transmission) poses the risk of making the server block indefinetly.

What can we do about it?

What we need to deal with situations like the above, is a way to handle I/O (we use sockets for this example, but the rules apply in general to any kind of filehandles) independently and with some sort of apparent parallelism/multiprocessing. There are two very common approaches to deal with this.

One approach is to spawn separate threads of control to handle each request. This can be done either at process-level, using fork() to create a new process for each request, or at thread-level using perl's threading capabilities to create multiple threads within the same process. (Perl's support for threads was introduced in version 5.005)

The other approach - which is the one that we will discuss here - is to use the select() to multiplex between several filehandles within a single thread of control, thus creating the effect of parallelism in the handling of I/O.

What does select() do?

The idea behind select() is to avoid blocking calls by making sure that a call will not block before attempting it. How do we do that? Suppose we have two filehandles, and we want to read data from them as it comes in. Let's call them A and B. Now, let's assume that A has no input pending yet, but B is ready to respond to a read() call. If we know this bit of information, we can try readin from B first, instead of A, knowing that our call will not block. select() gives us this bit of information. All we need to do is to define sets of filehandles (one for reading, one for writing and one for errors) and ask call select() on them which will return a filehandle which is ready to perform the operation for which it has been delegated (depending on which set it is in) as soon as such a filhandle is ready.

Obviously this provides us with the advantage of always picking up a filehandle that will not block thus avoiding the possibility of delaying the entire program for one lazy filehandle just because it happened to be the first we picked at random. Still, it does not guarantee that the selected filehandle is the best choice, because we still don't know how much data can be read, or how qucikly it can take in data that we wrte to it. But it is definetly a big step forward from our initial program.

Using select()

We will try writing the example program we attempted on the beginnign of this article, but now using the select() method. Instead of using perl's select call directly we will use a wrapper module, IO::Select that makes life easier for us.

... create socket as before ... 11 use IO::Select; 12 $read_set = new IO::Select(); # create handle set for reading 13 $read_set->add($s); # add the main socket to the set 14 15 while (1) { # forever 16 # get a set of readable handles (blocks until at least one handle is ready) 17 my ($rh_set) = IO::Select->select($read_set, undef, undef, 0); 18 # take all readable handles in turn 19 foreach $rh (@$rh_set) { 20 # if it is the main socket then we have an incoming connection and 21 # we should accept() it and then add the new socket to the $read_set 22 if ($rh == $s) { 23 $ns = $rh->accept(); 24 $read_set->add($ns); 25 } 26 # otherwise it is an ordinary socket and we should read and process the request 27 else { 28 $buf = <$rh>; 29 if($buf) { # we get normal input 30 # ... process $buf ... 31 } 32 else { # the client has closed the socket 33 # remove the socket from the $read_set and close it 34 $read_set->remove($rh); 35 close($rh); 36 } 37 } 38 } 39 }

We create an IO::Select object, $read_set, which is our set of handles to test for readability, and add all open handles to it. We start by adding the main socket and each time a new connection is made returning a new socket for it, we add that socket to the set. Then we go into a loop where we ask select to give us a list of readable handles and we examine each one in turn. If it is the main socket then we want to call accept() to receive the incoming connection and add the new socket to the read set. Otherwise it must be an ordinary socket in which case we read from it and process its input. If the read fails, that means the socket has been closed on the client side, so we close it, too, and remove it from the read set. So we work our way continuously through the incoming requests, by making sure that a call for I/O on any filehandle will progress since select() tells us it will.

As we already mentioned earlier, this method does not guarantee progress as it only tests whether a handle is ready to respond to I/O. The question still remains, whether the handle we pick from the ready ones is the one that will respond faster to I/O, and how much data there is available for reading or how much data it is ready to receive. So it is still possible to block a bit after the point where we picked the handle. Also, we did not take into account the impact on performance that the actual processing of requests will have. We might just be printing incoming data to a file, but then again, each request might need heavy processing that would slow down the entire handle processing loop. But these are issues that must be considered in the context of the individual application.

Digg! Save This Page

Comments

David   

Posted at 11:33pm on Wednesday, June 27th, 2007

Excellent tutorial, thank you!

max   

Posted at 6:45pm on Friday, July 13th, 2007

great read

Vijay   

Posted at 8:20am on Wednesday, July 18th, 2007

A very good tuturial, very clearly explained. Thank you.

Anis   

Posted at 4:10am on Friday, July 20th, 2007

Excellent thanks..

Shahid Khan   

Posted at 1:06am on Thursday, October 18th, 2007

SO useful information

Jonathan Perkin   

Posted at 6:09am on Tuesday, January 29th, 2008

The last argument to select() should really be undef, so that it blocks until ready. A timeout of 0 means continuously check, so it chews up 100% CPU.

Wilko   

Posted at 1:24pm on Monday, February 4th, 2008

Very good article thank you. Thanks to Jonathan P aswell! I was maxxing out the CPU whilst the server was waiting for incoming connections. Changing the 0 to undef worked a treat. Thanks again

DimeCadmium   

Posted at 1:02pm on Tuesday, March 4th, 2008

A better way (IMO) to get the error message (more details): $@
Also, I use:
new IO::Socket::INET(...) or die "No socket: $!/$@n";

alpha   

Posted at 10:56pm on Friday, March 14th, 2008

You should not mix buffered input, i.e. , with select. Use {select/sysread/syswrite} or {print//read/write}

He Man   

Posted at 4:42am on Wednesday, March 19th, 2008

NICE....

Rick   

Posted at 4:49pm on Friday, April 25th, 2008

I believe line 28 is a blocking IO statement. If the other end of the connection went away during an IO, this entire app will wait on that line. I tested this via Telnet as the other end - when I type my first character, the IO::Select detects it and then it blocks at line 28 until I hit carriage return. Does anyone have a solution to this issue?

MattCarter   

Posted at 11:39am on Wednesday, July 30th, 2008

As alpha pointed out, the IO::Select example above has a serious flaw: The diamond operator () (the shortcut for readline()) does buffered I/O. The buffer that perl uses for the diamond operator is NOT visible to IO::Select. So, the above code will hang in the subsequent select call if multiple lines arrive simultaneously. To avoid this problem, the perl program must use unbuffered IO calls like sysread(...) .

Ankit Kapoor   

Posted at 12:06pm on Sunday, November 9th, 2008

Xcellent Tutorial!

Comments to date: 13.

Your name:
Your comments:

Security check *

 

Like it? Share it!

  Post to del.icio.us
Post to
del.icio.us
   

Suggested Reading

Order your copy of Advanced Perl Programming now! Advanced Perl Programming among various other very interesting subjects, dedicates a chapter to socket programming, providing a very clear and to-the-point approach to the issue. There is a short discussion on select() and its use to manipulate sockets. It is also a good book to have in general if you're seriously interested in perl programming.