A definitive treatise on coping with DLL hell (in general, not just in the Windows world whence the name came) would be nice.
DLL hell nowadays, and in the Unix world, is what you get when a single process loads and runs (or tries to) two or more versions of the same shared object, at the same time, or when multiple versions of the same shared object exist on the system and the wrong one (from the point of view of a caller in that process) gets loaded. This can happen for several reasons, and when it does the results tend to be spectacular.
Typically DLL hell can result when:
- multiple versions of the same shared object are shipped by the same product/OS vendor as an accident of development in a very large organization or of political issues;
- multiple versions of the same shared object are shipped by the same product/OS vendor as a result of incompatible changes made in various versions of that shared object without corresponding updates to all consumers of that shared object shipped by the vendor (this is really just a variant of the previous case);
- a third party ships a plug-in that uses a version of the shared object also shipped by the third party, and which conflicts with a copy shipped by the vendor of the product into which the plug-in plugs in, or where such a conflict arises later when the vendor begins to ship that shared object (this is not uncommon in the world of open source, where some project becomes very popular and eventually every OS must include it);
At first glance the obvious answer is to get all developers, at the vendor and third parties, to ship updates that remove the conflict by ensuring that a single version, shipped by the vendor, will be used. But in practice this can be really difficult to do because: a) there’s too many parties to coordinate with, none of whom budgeted for DLL hell surprises and none of whom appreciate the surprise or want to do anything about it when another party could do something instead, b) agreeing on a single version of said object may involve doing lots of development to ensure that all consumers can use the chosen version, c) there’s always the risk that future consumers of this shared object will want a new, backwards-incompatible version of that object, which means that DLL hell is never ending.
Ideally libraries should be designed so that DLL hell is reasonably survivable. But this too is not necessarily easy, and requires much help from the language run-time or run-time linker/loader. I wonder how far such an approach could take us.
Consider a library like SQLite3. As long as each consumer’s symbol references to SQLite3 APIs are bound to the correct version of SQLite3, then there should be no problem, right? I think that’s almost correct, just not quite. Specifically, SQLite3 relies on POSIX advisory file locking, and if you read the comments on that in the src/os_unix.c file in SQLite3 sources, you’ll quickly realize that yes, you can have multiple versions of SQLite3 in one process, provided that they are not accessing the same database files!
In other words, multiple versions of some library, in one process, can co-exist provided that there’s no implied, and unexpected shared state between them that could cause corruption.
What sorts of such implied, unexpected shared state might there be? Objects named after the process’ PID come to mind, for example (pidfiles, …). And POSIX advisory file locking (see above). What else? Imagine a utility function that looks through the process’ open file descriptors looking for ones that the library owns — oops, but at least that’s not very likely. Any process-local namespace that is accessible by all objects in that process will provide a source of conflicts. Fortunately thread-specific keys are safe.
DLL hell is painful, and it can’t be prevented altogether. Perhaps we could produce a set of library design guidelines that developers could follow to produce DLL hell-safe libraries. The first step would be to make sure that the run-time can deal. Fortunately the Solaris linker provides “direct binding” (-B direct) and “groups” (-B group and RTLD_GROUP), so that between the two (and run-path and the like) it should be possible to ensure that each consumer of some library always gets the right one (provided one does not use LD_PRELOAD). Perhaps between linker features, careful coding and careful use, DLL hell can be made survivable in most cases. Thoughts? Comments?