Crypto (secret) – nector (bind)

•January 30, 2011 • 1 Comment

Welcome to my blog. My name is Nicolas Williams, and I’m a software engineer at Oracle, originally Sun Microsystems, but very soon to be an independent consultant.

This blog’s name derives from the Greek for secret (‘crypto’) and the Latin for bind (‘nector’). This is an allusion to my work on channel binding, a cryptographic protocol technique.  Of course, some Latin/Greek combination of ‘channel’ and ‘binding’ would have been better, but for the fact that I couldn’t find one that sounded right.  The name I settled on has the benefit of sounding like a portemanteau of “cryptography” and “connector”, though that was not my intention.

I will be blogging more here soon.

DLL Hell on Linux, but not Solaris; also: POSIX file locks are insane

•February 14, 2012 • Leave a Comment

There was a very interesting thread recently on the MIT kerberos list.  It involves DLL hell on Linux.  Tom Yu and I continued this on #krbdev for a while.  I’ve long been a fan of the Solaris RTLD, and the linker aliens (the engineers that work on it).  Features like direct binding and RTLD_GROUP have been incredibly useful at preventing all sorts of linking issues.  But it was a surprise to me that the Solaris RTLD had changed in a way that greatly reduces the risk of DLL hell; I should have been aware of this change, but I wasn’t, probably precisely because it had been made.

I’m going to plug a few things about the Solaris link-editor and RTLD:

  • Direct binding rocks (see also, and much else).  Direct binding improves performance, much like the Linux linker’s pre-linking, though not quite as much.  But more than anything it makes dynamic linking have semantics that are much closer to static linking as far as accidental symbol interposition goes: accidental symbol interposition can’t happen.  Pre-linking doesn’t get you that.  This is *huge*.  IMO -B direct needs to be made the default, at least some day.
  • The search-for-SONAMEs-in-RPATH-first-then-the-link-map feature that I just learned of is simply awesome.  I do think it violates expectations in some ways, but it’s a violation that significantly improves the system.
  • RTLD_GROUP, and generally all the RTLD_* flags that Solaris has but Linux lacks (this has varied over time, as Linux has adopted some Solaris-isms).
  • The documentation on the Solaris linker is simply one of the best things about it.  I first happened upon the Solaris linker guide in 1999 or 2000.  It has been an incredibly valuable tool to me.
  • The Solaris linker source from the last OpenSolaris build is available, and relatively easy to understand.

Now, DLL Hell arises (at least on Unix systems) when you have multiple versions of the same software installed and somehow two or more of those versions end up getting referenced and/or loaded into the same process.  Typically this happens when there are pluggable frameworks involved, such as PAM (but also name services, PKCS#11, and many other frameworks).  A typical example might be an sshd that uses the GSS-API, and through it, Kerberos, while also using PAM, and through it a PAM Kerberos module that uses a different Kerberos library version.  If only one version of the affected dependency gets loaded and it’s compatible with both of its dependents then all will be fine, else all hell breaks loose.  If both versions get loaded and each dependent links with the correct dependency, then most likely everything works out.

And here we come to the part of this post that is about POSIX file locking.  What might go wrong if two versions of a library are loaded and used in the same process?  Well, there’s not a lot of obvious things that might go wrong — after all, each version will have its own global variable namespace and since the typical DLL Hell cases by their nature result n “never the twain shall meet”, it stands to reason that having two versions of a library loaded in the same process should be no different than having two different implementations of the same feature loaded in the same process.

But here’s the rub: POSIX file locks are insane.  And if two shared objects in one process use POSIX file locking to synchronize access to the same files without knowing about each other, then they’ll step all over each other’s toes, likely resulting in corruption of some sort.  This is because the first close(2) of a file descriptor drops all POSIX file locks held by the process on that file, even when those locks were acquired on a different file descriptor that references the same file.  Insane, right?

Take a look at the way SQLite3 handles POSIX file locking.  Pretty gross.

Now suppose you have two versions of libsqlite3 loaded in the same process.  If they access different DB files (and journals, and WAL files) then all will be fine.  But if they should try to access the same DB files concurrently then each instance of libsqlite3 will fail to know about the other’s file locks!  The result may well be DB corruption (I haven’t tried it).  Note that the SQLite3 developers have at times recommended static linking… (sorry, no link; search the sqlite-users list archive)…

Now, suppose that the Kerberos library were to use POSIX file locking for synchronizing access to, say, credentials cache files (ccache), keytab files, replay cache files (rcache), etcetera… (it does, actually).  Or OpenSSL, perhaps, might use POSIX file locks to control write access to trust anchors (unlikely to change often), pre-shared certs, and private keys, say (OpenSSL does no such thing, I checked, I’m asking you to imagine that it did just because OpenSSL has been frequently involved in DLL Hell in my experience).  Yeah, that could be bad.

So the biggest source of problems with having multiple instances of a library in the same process, on Unix and Unix-like systems, is likely going to be POSIX file locking.

My friend Simo tells me that POSIX file locking should just be changed to not drop all locks on first close, that it should drop only those locks that were obtained on the file descriptor being closed, and that this should be done unilaterally (and that it can already be done on Linux with a mount option).  I suspect that there are a handful of applications that depend on first-close-drops-all-locks, but not enough to make such a change too risky, so I’m coming around to his view on this.  Particularly if the RTLDs make it easier to survive DLL Hell, as Solaris’ does.

I plead with the Linux RTLD powers that be: please add RTLD_GROUP, -Bdirect, and search the rpath before the link map like Solaris does.

As for POSIX file locks, we should start a conversation about fixing them.  Replacing POSIX FIle locks with something similar but not broken is probably a non-starter, but I’m open to arguments to the contrary.


More on what’s wrong with PAM

•February 13, 2012 • 1 Comment

In my all-that’s-wrong-with-PAM post I should have mentioned logindevperm: the setting of ownership and permissions on the devices that make up the login “seat”.  This is a classic thing that PAM should do but typically doesn’t.

What else does PAM not do that it should?  On Solaris, for example: utmpx and /var/lastlog handling.

There’s all sorts of code in /bin/login and GDB that should be in PAM modules instead.  For example, loading and saving of session preferences in GDM, which one would think should be entirely specific to GDM, but which interacts with authentication in very special ways: it may not be possible to even attempt to access a user’s home directory until after the user’s Kerberos (or whatever) credentials are available), in which case what should GDM do for dealing with avatar images and session preferences?  Right, GDM should (and does) use /var as a cache prior to having home directory access.  But with rich enough conversations we could do all of this work in the modules — GDM-specific modules, perhaps, but then, most of these session preferences are actually generic, as are avatar pics.

If enough of these things move into PAM, and if we use a model where the login process spawns user processes, then we can collapse all of pam_authenticate(), pam_acct_mgmt(), pam_chauthtok(), pam_setcred(), and pam_open_session into one function — a simpler API for sure!

All that’s wrong with PAM

•February 10, 2012 • 1 Comment

Yes, this post’s title is an ambitious one. The post will either not be comprehensive or it will be long. A friend jokes I’d have to write a book to be true to this post’s title, but since there would be no market for such a book, a blog it will be.

Pluggable Authentication Modules (PAM) is a user authentication technology used on some Unix and Unix-like systems, such as Solaris, Linux, and some BSDs. As the name says, it’s pluggable, and that’s about the best thing about PAM because authentication has changed a lot in the Unix space in the past twenty years, but PAM applications have not had to change quite as much.

PAM abstracts away two major aspects of Unix user authentication:

  • interaction (e.g., “Password: ” prompts)
  • sequencing of the login process (user authentication, access control, password expiration …)

I’ll assume that the reader has a passing familiarity with PAM for the rest of this blog entry.

Some of the Problems With PAM

I’m here to tell you that PAM is out of date.  Very out of date, and it needs a complete overhaul or replacement.  First let’s look at a laundry list of problems with PAM, in no particular order:

User interaction in PAM is designed with text-based login programs in mind

Nowadays graphical login programs (such as GDM) and biometric and smartcard use cases require much more advanced interaction facilities.  Not only that, but PAM has been used in web applications, via Apache’s mod_auth_pam, and text-only prompts are decidedly old hat in web applications.  (Of course, the use of user interaction via HTML forms + cookies is itself rather problematic, but let’s ignore that issue for the time being.)

Think of modern GUI login screens (see screenshots), which often have avatar selection instead of “Username: ” prompts, as well as choices of session types, locales, and accessibility options.  A rich media experience is becoming a requirement.  Those identification screens are not driven by PAM because PAM can’t do much more than prompt for a username as far as identification is concerned.

Related to this:

  • There is not enough metadata in PAM conversations (interaction).  Is a prompt for a username?  For a password?  For a new password?  For an OTP?  For a smartcard PIN?
  • There is not enough information for the modules about the capabilities of the PAM application.  Is it a GUI? a BUI? a text-based app?  The best you can do is establish conventions for PAM_SERVICE naming.

Interaction with remote authentication

There is almost none, with the only information available to PAM about remote authentication is that it happened, the remote hostname, and the application name (via PAM_SERVICE).  This is a huge problem, perhaps the most important one after graphical interaction limitations.  It’d be nice if a module could handle SSHv2 pubkey authorization, or Kerberos principal-to-user-account authorization.

There’s quite a bit more to this.  For example, a Kerberos ticket might include a KDC-issued session ID (e.g., in the Windows PAC) that would be quite handy to record for auditing purposes.  And PAM is a great place to apply policy regarding, e.g., levels of assurance (LoA) and other such policies to remote authentication.  This means that merely passing the name of the authenticated client principal is insufficient.

Fixing this has the potential to greatly simplify many remote application.  But do not get confused: the remote authentication protocols themselves should NOT be implemented by PAM, only authorization to Unix accounts and policy enforcement.

Sequencing Issues

The sequence of steps that PAM models is out of date.  In particular there needs to be an “identification” step to cover username discovery, and that allows for asynchronous (cancellable) interactions.  For example, a module might want to ask the user to touch a fingerprint reader or insert a smartcard, but progress might be made by the user simply typing in a username (or clicking on an avatar).  These prompts need to be cancellable because one should want the biometric/smartcard prompts to go away as soon as the user has acted on them, without the use having to click on a button in a dialog.

Other sequencing problems include the rather odd requirement that first comes authentication, then authorization, then handling of password expiration.  Some authentication methods, such as Kerberos and LDAP password validation, are not able to separate authentication from password expiration.


  • Yet another sequencing problem relates to the pam_setcred(3PAM) call and associated entry point.  The semantics of the flags for this function are not very well defined.  It’s one thing for a module to, say, write out Kerberos tickets to a credentials cache, and quite another for a module to modify the credentials of the current process — but the setcred stack and flags do not make it easy to figure out how to properly configure a PAM service.
  • PAM uses one configuration for two rather different steps in the login sequence by deriving a setcred configuration from an auth configuration.  This is extremely obnoxious.
  • Yet another sequencing problem is the idea that all passwords for a user should be changed at once.  Instead authentication modules should be free to change expired passwords as necessary (reusing any new password entered earlier if at all possible), with the balance of password modules getting the new password afterwards.  This requires separating password change readiness and local password quality policy so that they can be invoked when the first authentication module requires changing an expired password, while deferring actual password changes to post-authentication.
  • The most serious sequencing problem in PAM, IMO, is the fact that too many details of sequencing are exposed to the application.  If the application developer gets the sequencing wrong, the result can be a disastrous security vulnerability.   I.e., the PAM API is too complex because it exposes too much of the sequence of steps to the application.

Configuration Complexity and Limitations

Configuration complexity: originally PAM was easy to configure… because there were few modules and few PAM applications.  Nowadays PAM configuration almost requires a PhD, and it differs significantly from one implementation to another.

Configuration is selected by the PAM_SERVICE name — the name of the application.  But often we need to use different configurations according to who the user is (some users have OTP tokens, some have smartcards, …).

Misc. Issues

  • Incompatible API variations across implementations.  So much for portability for PAM applications….
  • Back to pam_setcred(), it’d be nice if PAM could abstract all the details of privilege / user context management.  In particular it’d be nice if it could provide functions to temporarily and permanently switch between up to three contexts: fully privileged, post authentication reduced privileges, and user context (which also generally has reduced privileges, of course).  And while we’re at it, it’d be nice if there was a convenient interface for spawning processes (think of posix_spawn(3C)) in reduced privilege and user contexts.
  • Isolation and privilege separation: some modules don’t need all privileges, but they all must run in the same process, with all privileges.  Think of OTP modules, logging modules, and modules like pam_xauth.  Some modules can -effectively- decide who gets in, how, and with what privileges; there’s no point isolating such modules from each other, but isolating them from third-party modules that don’t need privilege is critical.  One problem is that pam_get/set_data(3PAM) is a mechanism that modules use to communicate with each other, and its semantics prevent isolation.  Of course, if the only way to manage privilege were by spawning new processes in the desired context, then there would be less pressure still to run the modules in the process that’s invoking them.
  • No strong C type safety for PAM items (see pam_get/set_item(3PAM)).

I’m sure I’m leaving some problems out, but that’s enough for a start anyways.  There’s enough there to start!

Towards Enhancement or Replacement

This is just a what’s-wrong-with-PAM blog entry, not a how-to-fix-or-replace-PAM blog entry, but I can’t help but touch on the latter briefly.

Some of these problems can be difficult or dangerous to fix in a backwards-compatible way.  Changing the API so that sequencing is done in fewer function calls would risk either that new applications will fail open when used with old PAM implementations, or that old applications will fail open when used with new PAM implementations.  I think a lot can be done to improve PAM backwards-compatibly, but the best approach would probably be to have a brand new API that can use old PAM modules where appropriate.  If the new API is sufficiently simple to use, then it won’t be hard to retrofit PAM applications to use it.  I’m becoming biased toward outright replacement of PAM with something new.  Although if one were willing to accept this problem then it should be possible to extend PAM instead of replacing it, in spite of the enormity of the improvements needed.

The biggest, hardest nut to crack will be the need for rich media and metadata in user interactions.  My current vision regarding graphical login apps is that they should embed a very limited web browser and JavaScript engine, with just one webpage loaded that has a JS script that interfaces with PAM/whatever.  This vision has a number of benefits: a) it substitutes BUI development for GUI development, b) it allows for much of the UI to be written in a high-level language (JavaScript), c) this definitely obtains rich media support.

The main downside to this vision is that it might encourage web developers to reuse PAM/whatever in web applications, which in turn will tend to perpetuate passwords-in-forms-POSTed-over-HTTPS-plus-cookies — something I’d rather not do.  But it’s possible that one might want to combine some cryptographic authentication protocol with additional authentication factors, with the latter implemented through a PAM-like interaction.  So I don’t want to dismiss the use of PAM or similar in web applications, not with prejudice.

With this vision the best way to represent interactions is using JSON with a schema, as that’s easy to handle in JS and it’s easier to handle in non-browser-based applications than any other possible representation that would also work for the browser case (XHTML, XML).  One benefit of using JSON instead of XHTML is that it’s easier to extend an ad-hoc JSON schema backwards-compatibly to add metadata as we find it necessary than it is to extend XHTML…  To make life simpler for /bin/login and sshd developer, a utility function could be provided with which to extract text prompts and meta-data for a text-based API.

The next problem to solve would be asynchronous cancellation of prompts such as “touch finger to fingerprint reader”, “insert smartcard”, and “enter smartcard PIN” (when the smartcard has a PIN pad).  My approach for this would be to have a callback function for this.  Yes, it’d be nice to be able to tie the modules needing to issue such cancellations to the application’s I/O event loop, but doing so would require private interface contracts with PKCS#11‘s C_WaitForSlotEvent()  entry point (or actual extensions to PKCS#11), for one example, which leads me to believe that requiring threads and callbacks for asynchronous prompt cancellation is inevitable.

Another hard nut to crack will be simplification of configuration.  The original PAM configuration was simple because there were few modules.  By adding enough module types to match the fine-grained sequencing we can once again reduce the number of modules needed for each step, thus simplifying configuration in one way while complicating it in another (by exposing the whole sequence of login steps to the administrator.  My current idea is that ultimately this can be achieved by ensuring that there are configurations that work for every use case with very minor tweaks, then provide a simple configuration selection scheme.  Also, I’d like to group modules into sets by functionality (authentication vs. account authority vs. password sync vs…) and, for each function, specify sets of all-required and any-is-sufficient modules — reminiscent of pam.conf, I know, but if we have enough discrete login steps I believe this can work.

For remote login application integration I believe we need interfaces by which applications can give to the framework the following types of objects:

  • GSS-API peer name, security context, and delegated credential objects;
  • Peer certificate (e.g., for TLS user cert authentication use cases);
  • Peer public key type and key (for SSHv2 pubkey userauth);
  • Remote IP address, port number, …;
  • The actual socket(s)?  There is getpeeucred(3C) on Solaris, after all, which someday might add value with IPsec and IPsec channels
  • etcetera;

(In general, if we could model as much remote authentication information as possible as GSS-API name objects, it would be a huge simplicity win.  This could be as simple as defining incomplete GSS mechanisms with simple exported name token forms so that certificates, public keys, and so on could easily be imported, encapsulated in, and manipulated as a GSS-API name object.)


I could, and yet may go into excruciating detail of some or all of the problem areas listed above, but ~2100 words is a good start.  As for extensions to PAM or a replacement for PAM, that topic really deserves a separate blog entry, or github repo.

ZFS ACL/chmod interactions in Solaris 11

•November 15, 2011 • Leave a Comment

Oracle finally shipped Solaris 11. I’m guessing not many will be excited about it, since Oracle isn’t (I think) trying to get mind share for Solaris 11. But I’m excited about one thing: improved interactions between ACL and chmod in ZFS. The examples given in Oracle’s docs are instructive

What this basically means is that a filesystem object’s mode acts as a mask on the object’s ACL, with the mask (mode) re-computed from the ACL whenever the ACL is modified, and the ACL masked whenever it is read. Changing the mode merely changes the mask and thus the effective ACL. The masking behavior is dead simple to avoid DENY ACEs because DENY ACEs are evil: the mask simply reduces the permissions of ALLOW ACEs.

You might ask why DENY ACEs are evil. The answer has two parts. First, DENY ACEs with group subjects are evil because it’s difficult to ensure that all nodes where the ACL will be evaluated will see complete group memberships for the subject user, thus DENY ACEs with group subjects may fail to achieve their goal, and when they do they will do so silently — clearly a very bad thing. Second, Windows’ ACL GUI always sorts the ACL so DENY ACEs come first, which means that any masking algorithm that depends on interleaving ALLOW and DENY ACEs will be wrecked. In particular, if any DENY ACEs generated by a masking algorithm would deny the file’s owner access, then the situation becomes extremely confusing and painful for the user. And the Windows GUI will set the sorted ACL if you click OK! This used to be a problem with the old aclmode=groupmask behavior that was removed by the time Solaris 11 Express shipped.

One consequence of the new aclmode=mask behavior is that changing the ACL is a lossy operation when the object’s mode has been changed be a more restrictive mode since the last time the ACL was set. What is lost is the permissions bits that have been masked by that new mode, such that changing the mode back to a more permissive mode will not restore those lost permissions bits. In terms of the example given in the Oracle docs, if the ACL is modified between the chmod 640 and chmod 770 operations, then the chmod 770 operation will not restore any ACE permission bits. I think this is likely not to be a big deal for users in general. In terms that would be familiar to users accustomed to POSIX Draft ACL semantics: there is no mask entry as such, with the mode acting as the mask entry and ACL changes resulting in the mode being recomputed.

I hope users will like this new aclmode=mask setting. Users accustomed to POSIX Draft ACL semantics should feel right at home, though they’ll also note the lossiness I mention above. I’d love to get some feedback on this new feature in ZFS. I believe users will be generally happy with aclmode=mask, but I’d like to know for sure!

find | xargs idiom — or higher order shell functions

•October 11, 2011 • Leave a Comment

Here’s a shell function I’ve been using for over a decade. It takes three arguments: the name of a function to define, the name of a filter program (e.g., ‘grep’), and a find(1) expression, quoted. It defines a function by the given name that evaluates to find <directory arguments> | xargs <filter> <remaining-arguments>.

Here’s an example of how to use it:

% fsgen fmg grep '\( -name \*akefile -o -name \* -o -name \* -o \*.m4 -o \*.ac \);
% fmg . -- -O2

To make this work in KSH simple s/declare -a/set -A/.

My mnemonic for these shell functions is “find <type of thing> <filter>”. For example, fsg means “find source grep”, while fseg means “find source egrep”, and fmg means “find makefiles grep”.

fsgen () {
    typeset fname filter
    eval "function $fname  {
    typeset fargs
    declare -a fargs
    while [[ \$# -gt 1 && \"\$1\" != -* ]]
    nsfind \"\${fargs[@]}\" $*  -print | xargs $filter \"\$@\"

And the supporting nsfind function:

nsfind  () {
    typeset dargs
    declare -a dargs
    while [[ $# -gt 0 ]]
        [[ "$1" = -* || "$1" = \! ]] && break
    [[ ${#dargs[@]} -eq 0 ]] && dargs[${#dargs[@]}]=.
    [[ "$1" = — ]] && shift
    if [[ $# -eq 0 ]]
        find "${dargs[@]}" \( \( -name SCCS -o -name CVS -o -name .hg -o -name .git \) -type d -prune \) -o –print
        return $?
        for in "$@"
            if [[ "X$i" = "X-print" ]]
                find "${dargs[@]}" \( \( -name SCCS -o -name CVS -o -name .hg -o -name .git \) -type d -prune \) -o \( "$@" \)
                return $?
        find "${dargs[@]}" \( \( -name SCCS -o -name CVS -o -name .hg -o -name .git \) -type d -prune \) -o \( \( "$@" \) –print \)
        return $?

$EDITOR wrapper for use with cscope and screen(1)

•October 11, 2011 • Leave a Comment

I often have a screen(1) session running with a cscope in window 0, and I want every file I edit from cscope to start in its own window with control returning to cscope immediately. I use a wrapper around my $EDITOR that does this. It’s quite simple.

This way I can search for things with cscope, pick a result, switch back to window 0, pick another, and so on, without blocking cscope.

The script, formatted with VIM’s TOhtml command, follows.


# Makes for safer scripting
set -o noglob

# This script is to be used as the value of $EDITOR for cscope in a
# screen(1) session.  Selecting a file to edit should then cause a new
# screen window to open with the user’s preferred editor running in it
# to edit the selected file, all the while the script exits so that
# control can return to cscope.  This way the user can start many
# concurrent edits from one cscope session.

# Figure out what the real $EDITOR was intended to be from the current
# setting of $EDITOR or from this program’s name, removing any leading
# ‘s’ (since this program is "svim" or "s<whatever>").
# For example, calling this script "svim" causes it to start VIM;
# calling it "semacs" would cause it to start Emacs.
: ${editor:=vim}
export EDITOR=$editor


# Not in screen?  Punt.
[[ -z "$STY" ]] && exec "$editor" "$@"

# Find out if the parent is cscope
function is_parent_cscope {
    local IFS
    if [[ -f /proc/self/attr/current ]]; then
        # Linux fast path
        read pid comm j ppid j < /proc/self/stat
        read j comm j ppid j < /proc/$ppid/stat
        [[ "$comm" != *cscope* ]] && read j comm j j < /proc/$ppid/stat
        [[ "$comm" = *cscope* ]] && return 0
    elif [[ -f /kernel/genunix ]]; then
        # Solaris fast path
        ptree | grep cscope > /dev/null 2>&1 && return 0
        # Slow path
        comm="$(ps -o comm= -p $PPID)"
    [[ "$comm" = *cscope* ]] && return 0
    return 1

# Don’t try to start a new window _unless_ it was cscope that spawned us
# (the script appears to exit immediately, which causes many tools that
# spawn $EDITOR to think the user did not change the file at all).  If
# other tools like cscope could benefit from this behavior just add them
# here:
if ! is_parent_cscope; then
        exec $editor "$@"
        printf Failed to exec $EDITOR (%s) "$EDITOR" 1>&2
        exit 1

# I often edit/view the same file from cscope in the same screen.  I
# don’t want to be prompted by VIM on the second, third, .., nth
# viewing.  This code decides whether to edit the file read-only.
if [[ $EDITOR = vim ]]; then
    for arg in "$@"; do
        [[ "$arg" = -* || "$arg" = [+]]] && continue
        [[ "$arg" = */* && -f "${arg%/*}/.${arg##*/}.swp" ]] && use_roeditor=true
        [[ "$arg" != */* && -f ".${arg}.swp" ]] && use_roeditor=true
        # cscope doesn’t have us edit more than one file at a time
        ((files > 1)) && use_roeditor=false

$use_roeditor && editor=$roeditor

# Figure out the title for the new screen(1) window
for arg in "$@"do
        [[ "$arg" = -* || "$arg" = [+]]] && continue

# Start $EDITOR in a new screen window if we’re in a screen session.
# Note that screen in this mode will use IPC to ask the master screen
# process to start the new window, then the client screen process will
# exit, and since we exec it, we exit too.
[[ -n "$STY" && $# -gt 0 ]] && exec screen -h 1 -t "$title" "$editor" -X "$@"

# Fallback on not using screen
exec $editor "$@"
printf Failed to exec $EDITOR (%s) "$EDITOR" 1>&2
exit 1

On Unicode Normalization — or why normalization insensitivity should be rule

•April 13, 2010 • 5 Comments

Do you know what Unicode normalization is? If you have to deal with Unicode, then you should know. Otherwise this blog post is not for you. Target audience: a) Internet protocol authors, reviewers, IETF WG chairs, the IESG, b) developers in general, particularly any working on filesystems, networked applications or text processing applications.

Short-story: Unicode allows various characters to be written as a single “pre-composed” codepoint or a sequence of one character codepoint plus one or more combining codepoints. Think of ‘á’: it could be written as a single codepoint that corresponds to the ISO-8859 ‘á’ or as two codepoints, one being plain old ASCII ‘a’, and the other being the combining codepoint that says “add acute accent mark”. There are characters that can have five and more different representations in Unicode.

The long version of the story is too long to go into here. If you find yourself thinking that Unicode is insane, then you need to acquaint yourself with that long story. There are good reasons why Unicode has multiple ways to represent certain characters; wishing it weren’t so won’t do.

Summary: Unicode normalization creates problems.

So far the approach most often taken in Internet standards to deal with Unicode normalization issues has been to pick a normalization form and then say you “MUST” normalize text to that form. This rarely gets implemented because the burden is too high. Let’s call this the “normalize-always” (‘n-a‘, for short) model of normalization. Specifically, in the context of Internet / application protocols, the normalize-always model requires normalizing when: a) preparing query strings (typically on clients), b) creating storage strings (typically on servers). The normalize-always model typically results in all implementors having to implement Unicode normalization, regardless of whether they implement clients or servers.

Examples of protocols/specifications using n-a: stringprep, IMAP/LDAP/XMPP/… via SASL via SASLprepnameprep/IDNA (internationalized domainnames), Net Unicode, e-mail headers, and many others.

I want to promote a better alternative to the normalize-always model: the normalization-insensitive / normalization-preserving (or ‘n-i/n-p‘, for short) model.

In the n-i/n-p model you normalize only when you absolutely have to for interoperability:

  • when comparing Unicode strings (e.g, query strings to storage strings);
  • when creating hash/b-tree/other-index keys from Unicode strings (hash/index lookups are like string comparisons);
  • when you need canonical inputs to cryptographic signature/MAC generation/validation functions;

That’s a much smaller number of times and places that one needs to normalize strings than the n-a model. Moreover, in the context of many/most protocols normalization can be left entirely to servers rather than clients — simpler clients lead to better adoption rates. Easier adoption alone should be a sufficient advantage for the n-i/n-p model.

But it gets better too: the n-i/n-p model also provides better compatibility with and upgrade paths from legacy content. This is because in this model storage strings are not normalized on CREATE operations, which means that you can have Unicode and non-Unicode content co-existing side-by-side (though one should only do that as part of a migration to Unicode, as otherwise users can get confused).

The key to n-i/n-p is: fast n-i string comparison functions, as well as fast byte- or codepoint-at-a-time string normalization functions. By “fast” I mean that any time that two ASCII codepoints appear in sequence you have a fast path and can proceed to the next pair of codepoints starting with the second ASCII codepoint of the first pair. For all- or mostly-ASCII Unicode strings this fast path is not much slower than a typical for-every-byte loop. (Note that strcmp() optimizations such as loading and comparing 32 or 64 bits at a time apply to both, ASCII-only/8-bit-clean and n-i algorithms: you just need to check for any bytes with the high bit set, and whenever you see one you should trigger the slow path.) And, crucially, there’s no need for memory allocation when normalization is required in these functions: why build normalized copies of the inputs when all you’re doing is comparing or hashing them?

We’ve implemented normalization-insensitive/preserving behavior in ZFS, controlled by a dataset property (see also; see also; rationale). This means that NFS clients on Solaris, Linux, MacOS X, *BSD, Windows will interop with each other through ZFS-backed NFS servers regardless of what Unicode normalization forms they use, if any, and without having to have modified the clients to normalize.

My proposal: a) update stringprep to allow for profiles that specify n-i/n-p behavior, b) update SASLprep and various other stringprep profiles (but NOT Nameprep, nor IDNA) to specify n-i/n-p behavior, c) update Net Unicode to specify n-i/n-p behavior while still allowing normalization on CREATE as an option, d) update any other protocols that use n-a and which would benefit from using n-i/n-p to use n-i/n-p.

Your reactions? I expect skepticism, but think carefully, and consider ZFS’s solution (n-i/n-p) in the face of competitors that either normalize on CREATE or don’t normalize at all, plus the fact that some operating systems tend to prefer NFC (e.g., Solaris, Windows, Linux, *BSD) while others prefer NFD (e.g., MacOS X). If you’d keep n-a, please explain why.

NOTE to Linus Torvalds (and various Linux developers) w.r.t this old post on the git list: ZFS does not alter filenames on CREATE nor READDIR operations, ever [UPDATE: Apparently the port of ZFS to MacOS X used to normalize on CREATE to match HFS+ behavior]. ZFS supports case- and normalization-insenstive LOOKUPs — that’s all (compare to HFS+, which normalizes to NFD on CREATE).

NOTE ALSO that mixing Unicode and non-Unicode strings can cause cause strange codeset aliasing effects, even in the n-i/n-p model (if there are valid byte sequences in non-Unicode codesets that can be confused with valid UTF-8 byte sequences involving pre-composed and combining codepoints). I’ve not studied this codeset aliasing issue, but I suspect that the chances of such collisions with meaningful filenames is remote, and if the filesystem is setup to reject non-UTF-8 filenames then the chances that users will be able to create non-UTF-8 filenames without realizing that most such names will be rejected is infinitesimally small. This problem is best avoided by disallowing the creation of invalid UTF-8 filenames; ZFS has an option for that.

UPDATE: Note also that in some protocols you have to normalize early for cryptographic reasons, such as in Kerberos V5 AS-REQs when not using client name canonicalization, or in TGS-REQs when not using referrals. However, it’s best to design protocols to avoid this.

Ever wanted to be able to write C function calls with arguments named in the call?

•January 11, 2010 • 4 Comments

Have you ever wished you could write

        result = foo(.arg3 = xyz, .arg2 = abc);

as you can in some programming languages? If not then you’ve probably not met an API with functions that have a dozen arguments most of which take default values. There exist such APIs, and despite any revulsion you might feel about them, there’s often good reasons for the need for so many parameters that can be defaulted.

Well, you can get pretty close to such syntax, actually. I don’t know why, but I thought of the following hack at 1AM last Thursday, trying to sleep:

struct foo_args {
type1 arg1;
type2 arg2;
type3 arg3;
#define CALL_W_NAMED_ARGS3(result, fname, ...) \
do { \
struct fname ## _args _a = { __VA_ARGS__ };
(result) = fname(_a.arg1, _a.arg2, _a.arg3);
} while (0)
CALL_W_NAMED_ARGS3(res, foo, .arg2 = xyz, .arg3 = abc);

This relies on C99 struct initializer syntax and variadic macros, but it works. Arguments with non-zero default values can still be initialized since C99 struct initializer syntax allows fields to be assigned more than once.

If you use GCC’s statement expressions extension (which, incidentally, is supported by Sun Studio), you can even do this:

#define CALL_W_NAMED_ARGS3(fname, ...) \
({ \
struct fname ## _args _a = { __VA_ARGS__ };
fname(_a.arg1, _a.arg2, _a.arg3);
res = CALL_W_NAMED_ARGS3(foo, .arg2 = xyz, .arg3 = abc);

You can even define a macro that allows you to call functions by pointer, provided you have suitable typedefs:

struct foo_t_args {
#define CALL_PTR_W_NAMED_ARGS3(ftype, func, ...) \
({ \
struct ftype ## _args _a = { __VA_ARGS__ };
func(_a.arg1, _a.arg2, _a.arg3);
foo_t f = ...;
res = CALL_W_NAMED_ARGS3(foo_t, f, .arg2 = xyz, .arg3 = abc);

Useful? I’m not sure that I’d use it. I did a brief search and couldn’t find anything on this bit of black magic, so I thought I should at least blog it.

What’s interesting though is that C99 introduced initializer syntax that allows out of order, named field references (with missing fields getting initialized to zero) for structs (and arrays) but not function calls. The reason surely must be that while structs may not have unnamed fields, function prototypes not only may, but generally do have unnamed fields. (That and maybe no one thought of function calls with arguments named in the call parameter list).

Using DTrace to debug encrypted protocols

•August 24, 2009 • Leave a Comment

UPDATED: I hadn’t fully swapped in the context when I wrote this blog entry, and Jordan, the engineer working this bug, tells me that the primary problem is an incorrect interpretation of the security layers bitmask on the AD side. I describe that in detail at the end of the original post, plus I add links to the relevant RFCs).

A few months ago there was a bug report that the OpenSolaris CIFS server stack did not interop with Active Directory when “LDAP signing” was enabled. But packet captures, and truss/DTrace clearly showed that smbd/idmapd were properly encrypting and signing all LDAP traffic (when LDAP signing was disabled anyways), and with AES too. So, what gives?

Well, in the process of debugging the problem I realized that I needed to look at the cleartext of otherwise encrypted LDAP protocol data. Normally the way one would do this is to build a special version of the relevant library (the libsasl “gssapi” plugin, in this case) that prints the relevant cleartext. But that’s really obnoxious. There’s got to be a better way!

Well, there is. I’d already done this sort of thing in the past when debugging other interop bugs related to the Solaris Kerberos stack, and I’d done it with DTrace.

Let’s drill down the protocol stack. The LDAP clients in our case were using SASL/GSSAPI/Kerberos V5, with confidentiality protection “SASL security layers”, for network security. After looking at some AD docs I quickly concluded that “LDAP signing” clearly meant just that. So the next step was to look at the SASL/GSSAPI part of that stack. The RFC (originally RFC2222 now RFC4752 says that after exchanging the GSS-API Kerberos V5 messages [RFC4121] that setup a shared security context (session keys, …), the server sends a message to the client consisting of: a one-byte bitmask indicating what “security layers” the server supports (none, integrity protection, or confidentiality+integrity protection), and a 24 bit, network byte order maximum message size. But these four bytes are encrypted, so I couldn’t just capture packets and dissect them. The first order of business, then, was to extract these four bytes somehow.

I resorted to DTrace. Since the data in question is in user-land, I had to resort to using copyin() and hand-coding pointer traversal. The relevant function, gss_unwrap(), takes a pointer to a gss_buffer_desc struct that points to the ciphertext, and a pointer to a another gss_buffer_desc where the pointer to the cleartext will be stored. The script:

#!/usr/sbin/dtrace -Fs
* If we establish a sec context, then the next unwrap
* is of interest.
self->trace_unwrap = 1;
self->trace_wrap = 1;
/* Trace the ciphertext */
this->gss_wrapped_bufp = arg2;
this->buflen = *(unsigned int *)copyin(this->gss_wrapped_bufp, 4);
this->bufp = *(unsigned int *)copyin(this->gss_wrapped_bufp + 4, 4);
this->buf = copyin(this->bufp, 32);
tracemem(this->buf, 32);
/* Remember where the cleartext will go */
self->gss_bufp = arg3;
printf("unwrapped token will be in a gss_buffer_desc at %p\n", arg3);
this->gss_buf = copyin(self->gss_bufp, 8);
tracemem(this->gss_buf, 8);
* Now grab the cleartext and print it.
/self->trace_unwrap && self->gss_bufp/
this->gss_buf = copyin(self->gss_bufp, 8);
tracemem(this->gss_buf, 8);
this->buflen = *(unsigned int *)copyin(self->gss_bufp, 4);
self->bufp = *(unsigned int *)copyin(self->gss_bufp + 4, 4);
printf("\nServer wrap token was %d bytes long; data at %p (%p)\n",
this->buflen, self->bufp, self->gss_bufp);
this->buf = copyin(self->bufp, 4);
self->trace_unwrap = 0;
printf("Server wrap token data: %d\n", *(int *)this->buf);
tracemem(this->buf, 4);
* Do the same for the client's reply to the
* server's security layers and max message
* size negotiation offer.
self->trace_wrap = 0;
self->trace_unwrap = 0;
this->gss_bufp = arg4;
this->buflen = *(unsigned int *)copyin(this->gss_bufp, 4);
this->bufp = *(unsigned int *)copyin(this->gss_bufp + 4, 4);
this->buf = copyin(this->bufp, 4);
printf("Client reply is %d bytes long: %d\n", this->buflen,
*(int *)this->buf);
tracemem(this->buf, 4);

Armed with this script I could see that AD was offering all three security layer options, or only confidentiality protection, depending on whether LDAP signing was enabled. So far so good. The max message size offered was 10MB. 10MB! That’s enormous, and fishy. I immediately suspected an endianness bug. 10MB in flipped around would be… 40KB, which makes much more sense — our client’s default is 64KB. And what is 64KB interpreted as? All possible interpretations will surely be non-sensical to AD: 16MB, 256, or 1 byte.

Armed with a hypothesis, I needed more evidence. DTrace helped yet again. This time I used copyout to change the client’s response to the server’s security layer and max message size negotiation message. And lo and behold, it worked. The script:

#!/usr/sbin/dtrace -wFs
self->trace_unwrap = 0;
printf("This script is an attempted workaround for a possible interop bug in Windows Active Directory: if LDAP signing and s
ealing is enabled and idmapd fails to connect normally but succeeds when this script is used, then AD has an endianness interop bug
in its SASL/GSSAPI implementation\n");
* We're looking to modify the SASL/GSSAPI client security layer and max
* buffer selection.  That happens in the first wrap token sent after
* establishing a sec context.
self->trace_unwrap = 1;
/* This is that call to gss_wrap() */
self->trace_wrap = 0;
self->trace_wrap = 0;
self->trace_unwrap = 0;
this->gss_bufp = arg4;
this->buflen = *(unsigned int *)copyin(this->gss_bufp, 4);
this->bufp = *(unsigned int *)copyin(this->gss_bufp + 4, 4);
this->sec_layer = *(char *)copyin(this->bufp, 1);
this->maxbuf_msb = (char *)copyin(this->bufp + 1, 1);
this->maxbuf_mid = (char *)copyin(this->bufp + 2, 1);
this->maxbuf_lsb = (char *)copyin(this->bufp + 3, 1);
printf("The client's wants to select: sec_layer = %d, max buffer = %d\n",
*this->maxbuf_msb << 16 +
*this->maxbuf_mid << 8  +
/* Now fix it so it matches what we've seen AD advertise */
*this->maxbuf_msb = 0xa0;
*this->maxbuf_mid = 0;
*this->maxbuf_lsb = 0;
copyout(this->maxbuf_msb, this->bufp + 1, 1);
copyout(this->maxbuf_mid, this->bufp + 2, 1);
copyout(this->maxbuf_lsb, this->bufp + 3, 1);
printf("Modified the client's SASL/GSSAPI max buffer selection\n");
* These wrap tokens will be for the security layer -- if we see these
* then idmapd and AD are happy together
printf("It worked!  AD has an endianness interop bug in its SASL/GSSAPI implementation -- tell them to read RFC4752\n");

Yes, DTrace is unwieldy when dealing with user-land C data (and no doubt it’s even more so for high level language data). But it does the job!

Separately from the endianness issue, AD also misinterprets the security layers bitmask. The RFC is clear, in my opinion, though it takes careful reading (so maybe it’s “clear”), that this bitmask is a mask of one, two or three bits set when sent by the server, but a single bit when sent by the client. It’s also clear, if one follows the chain of documents, that “confidentiality protection” means “confidentiality _and_ integrity protection” in this context (again, perhaps I should say “clear”). The real problem is that the RFC is written in English, not in English-technicalese, saying this about the bitmask sent by the server:

The client passes this token to GSS_Unwrap and interprets
the first octet of resulting cleartext as a bit-mask specifying the
security layers supported by the server and the second through fourth
octets as the maximum size output_message to send to the server.

and this about the bitmask sent by the client:

client then constructs data, with the first octet containing the
bit-mask specifying the selected security layer, the second through
fourth octets containing in network byte order the maximum size
output_message the client is able to receive, and the remaining
octets containing the authorization identity.

Note that “security layers” is plural in the first case, singular in the second.

Note too that for GSS-API mechanisms GSS_Wrap/Unwrap() always do integrity protection — only confidentiality protection is optional. But RFCs 2222/4752 say nothing of this, so that only an expert in the GSS-API would have known this. AD expects the client to send 0x06 as the bitmask when the server is configured to require LDAP signing and sealing. Makes sense: 0x04 is “confidentiality protection” (“sealing”) and 0x02 is “integrity protection” (“signing”). But other implementations would be free to consider that an error, which means that we have an interesting interop problem… And, given the weak language of RFCs 2222/4752, this mistake seems entirely reasonable, even if it is very unfortunate.