Using DTrace to debug encrypted protocols

UPDATED: I hadn’t fully swapped in the context when I wrote this blog entry, and Jordan, the engineer working this bug, tells me that the primary problem is an incorrect interpretation of the security layers bitmask on the AD side. I describe that in detail at the end of the original post, plus I add links to the relevant RFCs).

A few months ago there was a bug report that the OpenSolaris CIFS server stack did not interop with Active Directory when “LDAP signing” was enabled. But packet captures, and truss/DTrace clearly showed that smbd/idmapd were properly encrypting and signing all LDAP traffic (when LDAP signing was disabled anyways), and with AES too. So, what gives?

Well, in the process of debugging the problem I realized that I needed to look at the cleartext of otherwise encrypted LDAP protocol data. Normally the way one would do this is to build a special version of the relevant library (the libsasl “gssapi” plugin, in this case) that prints the relevant cleartext. But that’s really obnoxious. There’s got to be a better way!

Well, there is. I’d already done this sort of thing in the past when debugging other interop bugs related to the Solaris Kerberos stack, and I’d done it with DTrace.

Let’s drill down the protocol stack. The LDAP clients in our case were using SASL/GSSAPI/Kerberos V5, with confidentiality protection “SASL security layers”, for network security. After looking at some AD docs I quickly concluded that “LDAP signing” clearly meant just that. So the next step was to look at the SASL/GSSAPI part of that stack. The RFC (originally RFC2222 now RFC4752 says that after exchanging the GSS-API Kerberos V5 messages [RFC4121] that setup a shared security context (session keys, …), the server sends a message to the client consisting of: a one-byte bitmask indicating what “security layers” the server supports (none, integrity protection, or confidentiality+integrity protection), and a 24 bit, network byte order maximum message size. But these four bytes are encrypted, so I couldn’t just capture packets and dissect them. The first order of business, then, was to extract these four bytes somehow.

I resorted to DTrace. Since the data in question is in user-land, I had to resort to using copyin() and hand-coding pointer traversal. The relevant function, gss_unwrap(), takes a pointer to a gss_buffer_desc struct that points to the ciphertext, and a pointer to a another gss_buffer_desc where the pointer to the cleartext will be stored. The script:

#!/usr/sbin/dtrace -Fs
/*
* If we establish a sec context, then the next unwrap
* is of interest.
*/
pid$1::gss_init_sec_context:return
{
ustack();
self->trace_unwrap = 1;
}
pid$1::gss_unwrap:entry
/self->trace_unwrap/
{
ustack();
self->trace_wrap = 1;
/* Trace the ciphertext */
this->gss_wrapped_bufp = arg2;
this->buflen = *(unsigned int *)copyin(this->gss_wrapped_bufp, 4);
this->bufp = *(unsigned int *)copyin(this->gss_wrapped_bufp + 4, 4);
this->buf = copyin(this->bufp, 32);
tracemem(this->buf, 32);
/* Remember where the cleartext will go */
self->gss_bufp = arg3;
printf("unwrapped token will be in a gss_buffer_desc at %p\n", arg3);
this->gss_buf = copyin(self->gss_bufp, 8);
tracemem(this->gss_buf, 8);
}
/*
* Now grab the cleartext and print it.
*/
pid$1::gss_unwrap:return
/self->trace_unwrap && self->gss_bufp/
{
this->gss_buf = copyin(self->gss_bufp, 8);
tracemem(this->gss_buf, 8);
this->buflen = *(unsigned int *)copyin(self->gss_bufp, 4);
self->bufp = *(unsigned int *)copyin(self->gss_bufp + 4, 4);
printf("\nServer wrap token was %d bytes long; data at %p (%p)\n",
this->buflen, self->bufp, self->gss_bufp);
this->buf = copyin(self->bufp, 4);
self->trace_unwrap = 0;
printf("Server wrap token data: %d\n", *(int *)this->buf);
tracemem(this->buf, 4);
}
/*
* Do the same for the client's reply to the
* server's security layers and max message
* size negotiation offer.
*/
pid$1::gss_wrap:entry
/self->trace_wrap/
{
ustack();
self->trace_wrap = 0;
self->trace_unwrap = 0;
this->gss_bufp = arg4;
this->buflen = *(unsigned int *)copyin(this->gss_bufp, 4);
this->bufp = *(unsigned int *)copyin(this->gss_bufp + 4, 4);
this->buf = copyin(this->bufp, 4);
printf("Client reply is %d bytes long: %d\n", this->buflen,
*(int *)this->buf);
tracemem(this->buf, 4);
}

Armed with this script I could see that AD was offering all three security layer options, or only confidentiality protection, depending on whether LDAP signing was enabled. So far so good. The max message size offered was 10MB. 10MB! That’s enormous, and fishy. I immediately suspected an endianness bug. 10MB in flipped around would be… 40KB, which makes much more sense — our client’s default is 64KB. And what is 64KB interpreted as? All possible interpretations will surely be non-sensical to AD: 16MB, 256, or 1 byte.

Armed with a hypothesis, I needed more evidence. DTrace helped yet again. This time I used copyout to change the client’s response to the server’s security layer and max message size negotiation message. And lo and behold, it worked. The script:

#!/usr/sbin/dtrace -wFs
BEGIN
{
self->trace_unwrap = 0;
printf("This script is an attempted workaround for a possible interop bug in Windows Active Directory: if LDAP signing and s
ealing is enabled and idmapd fails to connect normally but succeeds when this script is used, then AD has an endianness interop bug
in its SASL/GSSAPI implementation\n");
}
/*
* We're looking to modify the SASL/GSSAPI client security layer and max
* buffer selection.  That happens in the first wrap token sent after
* establishing a sec context.
*/
pid$1::gss_init_sec_context:return
{
self->trace_unwrap = 1;
}
/* This is that call to gss_wrap() */
pid$1::gss_wrap:entry
/self->trace_wrap/
{
self->trace_wrap = 0;
ustack();
self->trace_wrap = 0;
self->trace_unwrap = 0;
this->gss_bufp = arg4;
this->buflen = *(unsigned int *)copyin(this->gss_bufp, 4);
this->bufp = *(unsigned int *)copyin(this->gss_bufp + 4, 4);
this->sec_layer = *(char *)copyin(this->bufp, 1);
this->maxbuf_msb = (char *)copyin(this->bufp + 1, 1);
this->maxbuf_mid = (char *)copyin(this->bufp + 2, 1);
this->maxbuf_lsb = (char *)copyin(this->bufp + 3, 1);
printf("The client's wants to select: sec_layer = %d, max buffer = %d\n",
(int)this->sec_layer,
*this->maxbuf_msb << 16 +
*this->maxbuf_mid << 8  +
*this->maxbuf_lsb);
/* Now fix it so it matches what we've seen AD advertise */
*this->maxbuf_msb = 0xa0;
*this->maxbuf_mid = 0;
*this->maxbuf_lsb = 0;
copyout(this->maxbuf_msb, this->bufp + 1, 1);
copyout(this->maxbuf_mid, this->bufp + 2, 1);
copyout(this->maxbuf_lsb, this->bufp + 3, 1);
printf("Modified the client's SASL/GSSAPI max buffer selection\n");
}
/*
* These wrap tokens will be for the security layer -- if we see these
* then idmapd and AD are happy together
*/
pid$1::gss_wrap:entry
/!self->trace_wrap/
{
printf("It worked!  AD has an endianness interop bug in its SASL/GSSAPI implementation -- tell them to read RFC4752\n");
}

Yes, DTrace is unwieldy when dealing with user-land C data (and no doubt it’s even more so for high level language data). But it does the job!

Separately from the endianness issue, AD also misinterprets the security layers bitmask. The RFC is clear, in my opinion, though it takes careful reading (so maybe it’s “clear”), that this bitmask is a mask of one, two or three bits set when sent by the server, but a single bit when sent by the client. It’s also clear, if one follows the chain of documents, that “confidentiality protection” means “confidentiality _and_ integrity protection” in this context (again, perhaps I should say “clear”). The real problem is that the RFC is written in English, not in English-technicalese, saying this about the bitmask sent by the server:

The client passes this token to GSS_Unwrap and interprets
the first octet of resulting cleartext as a bit-mask specifying the
security layers supported by the server and the second through fourth
octets as the maximum size output_message to send to the server.

and this about the bitmask sent by the client:

The
client then constructs data, with the first octet containing the
bit-mask specifying the selected security layer, the second through
fourth octets containing in network byte order the maximum size
output_message the client is able to receive, and the remaining
octets containing the authorization identity.

Note that “security layers” is plural in the first case, singular in the second.

Note too that for GSS-API mechanisms GSS_Wrap/Unwrap() always do integrity protection — only confidentiality protection is optional. But RFCs 2222/4752 say nothing of this, so that only an expert in the GSS-API would have known this. AD expects the client to send 0x06 as the bitmask when the server is configured to require LDAP signing and sealing. Makes sense: 0x04 is “confidentiality protection” (“sealing”) and 0x02 is “integrity protection” (“signing”). But other implementations would be free to consider that an error, which means that we have an interesting interop problem… And, given the weak language of RFCs 2222/4752, this mistake seems entirely reasonable, even if it is very unfortunate.

~ by nico on August 24, 2009.

Leave a Reply

Your email address will not be published. Required fields are marked *