[paludis-user] Workaround for problem with distcc + outputwrapper

Benjamin R. Haskell paludis at benizi.com
Mon Jan 18 03:25:22 UTC 2010


Summary:  Running 'distcc' w/ zeroconf support under the attached 
wrapper (as 'with-closed-fds distcc [distcc options]') prevents odd 
hangs.


Long version:

[This explanation ended up a little longer than I'd've liked, but 
nonetheless, might be interesting...]

Sometime between version 0.36-or-so and now, I noticed that there would 
often be odd pauses when running even very simple commands with paludis.  
It turns out to be a weird interaction with distcc, zeroconf, and 
outputwrapper.

In /etc/distcc/hosts:

+zeroconf

In my /etc/paludis/bashrc, among a bunch of other things:

if [ "$_OK_DISTCC" = "yes" ] && distcc --version &> /dev/null ; then
DISTCC_DIR="/var/tmp/paludis/.distcc"
PATH="/usr/lib/distcc/bin:$PATH"
SANDBOX_WRITE="$SANDBOX_WRITE:$DISTCC_DIR"
: ${_make_jobs="$(distcc -j)"}
: ${_make_jobs:=2}
_make_jobs=$(( $_make_jobs / 2 ))
fi
#...
MAKEOPTS="$MAKEOPTS -j$_make_jobs"

The $_OK_DISTCC check allows me to disable distcc (on gcc builds, for 
example, where the parallelism seems to cause out-of-order problems, or 
when I just don't want the excess load on my system).  And the 
_make_jobs=$(distcc -j) setting looks for how many distcc hosts are 
available at the time.

The problem is that 'distcc -j' with zeroconf fires off a daemon (See 
http://lists.samba.org/archive/distcc/2004q4/002774.html for the 
justification -- basically: the startup cost for collecting mDNS 
information is worth avoiding in a build that calls distcc many times.)

I saw in paludis/util/output_wrapper.cc that 'outputwrapper' does a wait 
for its child to finish.  And I saw in distcc's src/zeroconf.c that it 
does a pretty standard daemonization process:

pid = fork()

in the child:
1. close fd's 0,1,2
2. open "/dev/null", dup it twice, making sure fd's are 0, 1, and 2
3. chdir "/"
4. on systems that have it, setsid()
5. collect the info, and wait up to 20 seconds for further zeroconf queries

So, from the way 'outputwrapper' works, the problem is that 
'outputwrapper's fd's aren't in the set that get closed by 'distcc' 
before daemonizing.  And 'distcc' would thus sleep for 20 seconds every 
time 'bashrc' got sourced, unless I happened to have run 'distcc' 
outside of paludis (so that the daemon was already running outside of 
outputwrapper).

I'm going to suggest on the distcc list that the daemonization process 
closes a larger set of fd's.  (There is also a similar problem with some 
leaked fd's in the LVM2 utilities -- I've not corresponded w/ that 
community ever, but I'll try to find them, too.)  But, I just wanted to 
share the workaround if anyone else was having trouble (seems unlikely).

[Assuming anyone reading this far is very patient...]  I also wanted to 
poll paludis dev's to see whether they thought that the problem seems to 
be in the 'distcc' code, not in paludis.  This seems to be a pretty 
common daemonization pattern (fork, open fd's 0-2 to /dev/null, chdir 
"/", and setsid).  Might there be other programs affected by this?  
Would it be better to waitpid on a specific child's pid in 
output_wrapper.cc?  Or is spawning a daemon a rare enough thing (and 
maybe even 'wrong' in some sense) for a child process to do that it's 
not worth the effort?

Best,
Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: with-closed-fds.c
Type: text/x-c
Size: 561 bytes
Desc: 
URL: <http://lists.exherbo.org/pipermail/paludis-user/attachments/20100117/eff49e8a/attachment.c>


More information about the paludis-user mailing list