[Exherbo-dev] Current status of sandboxing

Ali Polatel alip at exherbo.org
Sat Nov 12 03:04:56 GMT 2011


In this mail, I am going to try to explain the current status of sydbox,
the problems it aims to solve together with some concepts and philosophy
of process tracing, sandboxing and debugging in general.

If you are reading this with the hope or intention of making sydbox
faster please stop reading right here. Seriously being fast was, and
will never be, a priority of sydbox, nor it should be. If you are after
faster installations, feel free to export PALUDIS_DO_NOTHING_SANDBOXY in
your precious environment. Take this as your first contribution to Exherbo
by not disturbing anyone about non-issues.

For the rest who are still reading, it may be a long, boring read but I
doubt it will be hard to understand considering I expected no previous
knowledge of sydbox or sandboxing in general. (This is partly because I
may have to get my girlfriend to read this mail as an excuse for staying
up so late...) I explained briefly about the concepts as I felt
necessary. Feel free to skip them if you have previous knowledge. I
kindly expect you to comment, share your own ideas and contribute.
Otherwise you get to accept my decisions - which are not really concrete
at the moment - or export the environment variable mentioned in the
previous paragraph *or* replace me with someone else who can do it
without whining. I should also stress that this is *not* an expectation
of appreciation, rather a call for help from an amateur programmer who
feels overburdened.

When I started writing sydbox during the last months of 2008, we were
using Gentoo's LD_PRELOAD based sandbox, a mere hack to debug
misbehaving builds. In general, the intention was to implement
"a sandbox which will (hopefully) suck less". Unlike sandbox, sydbox is
based on the ptrace() system call, adding a few "cool" features like
execution and network sandboxing.

By August 2009, after about 10 releases and with Exherbo's adoption of
sydbox as its default sandboxing application, I had made serious
progress but also hit quite a few problems along the road. Some of these
problems, which I was quite unfamiliar with back then, still remain
unresolved.

I am not going to discuss whether LD_PRELOAD based sandboxing or
ptrace() is "better", but you are encouraged to. In my opinion,
both ways have their own share of problems. It is difficult for me to
provide an objective judgement due to my differing familiarity with the
problems of these two approaches.

You can find a list of sydbox' unresolved issues below. This list is
incomplete and only expresses the problems which I think are the most
important and/or easy to solve with fairly little tweaking. I started
from less specific ptrace() related problems, going on my way to more
specific sydbox related problems.

## Overall ptrace() implementation complexity

If you have ever read the manual page ptrace(2) you may have noticed the
following quote:
'The SunOS man page describes ptrace() as "unique and arcane", which it is.'
I presume the author used the word unique to emphasize how ptrace()
differs vastly between different operating systems, more importantly,
between different architectures of the same operating system and arcane
to express how the ptrace() interface is peculiar compared to more
common interfaces provided by other system calls.

The documentation is barely enough to grasp the quirks and hidden
details so usually one has to read the source code of other projects,
gdb and strace to name a few. I am pretty sure you all use both of these
tools but reading their source code is a major pain in the ass. You can
imagine how their struggle to provide backward compatibility and
portability made the code quite hard to understand for an amateur
programmer.

I must admit, this claim is not entirely true for strace. I have never
had the chance or will to study its code thoroughly. From the first day
I started writing sydbox, my initial aim has always been to get
something working quickly followed by slow and careful addition of
features. Admittedly, I have rushed some feature additions for good
intentions like getting more testing from what I now understand are
mostly irrelevant projects or people.

## Unsuitability of ptrace() as an access control mechanism

ptrace() is primarily used to implement breakpoint debugging and system
call tracing. Therefore, implementing basic access control like denying
access to certain system calls is easy. Still not without hacks, though.
One starts facing problems when implementing more advanced access
control, some of which I do not know how to solve and some of which I do
not know whether ignorance is a better idea. Occasionally this leads
to "arcane" bugs and generally to code which is difficult to maintain.

One example is resolving file system related arguments of system calls
to canonical path names without sharing the same current working
directory or file descriptor table with the traced child process. Issues
like handling of "too long" path names, also known by the errno
ENAMETOOLONG, fall into this category as well.

Other examples include figuring out the actual port when a socket system
call, like bind(), is called with port zero specified in its arguments,
dealing with spawned children and proper inheritance of sandboxing
state, whitelisting, blacklisting based on execve() whilst ignoring the
initial execve() spawning the "eldest" traced process, providing runtime
configurability through the interception of stat() related system calls
and reading from or writing to the stat buffer of the traced child
process... The list goes on.

During these two years, I have done my best to solve these issues
properly. Most, if not all, of them led to workarounds and hacks. A few
were due to my own faults of not wishing to dedicate my time, judging
the effort worthless. Many others, however, were because I hit the hard
walls of ptrace() whilst having the aim to provide the required features
or solutions.

To be honest, today, I feel I am back to square one, with one more
software with hacks and workarounds, hard to maintain, even harder to
debug. Before diving into the code again to make drastic changes which
may get as far as a rewrite from scratch, I decided to do some research
with the hope that the improvements during these two years may provide a
simpler way to achieve what we are after.

There are many new and some old solutions which were previously unknown
to me for particular problems: Linux Security Modules, clone(2) with
restrictions like CLONE_NEWNET, containers, cgroups... The list may be
extended. The important question which should let us make a decision is
what we are exactly after. Debugging misbehaving builds or proper access
control providing simple, yet configurable and reproducible, security
for package installations which is otherwise not possible? Maybe our
problem was never setting this in stone, documenting it properly and
extensively. It was certainly not clear in 2008 but today on 12th
November 2011, one day after all weirdos suddenly got married, I believe
we have fairly good experience and knowledge about the problem.

Please discuss!

P.S: I stopped writing at this point not because I am done telling my
ideas but because I want to receive your opinions before going on any
further. In addition, I have a really bad flu and ten exams in the
upcoming week. This means I must go to bed when it's past 04:30am or
I will have to face the consequences determined by my currently weak
body and an angry girlfriend...

Wish me luck! I fear I may get haunted by Ritchie's ghost!

		-alip



More information about the Exherbo-dev mailing list