Even though I did read Donnie’s posts with his proposal on how to better Gentoo, I don’t really want to discuss them at the moment, since I have to admit that Gentoo politics is far from what I want to care right now, I have little time lately and that little time should not be wasted in discussing non-technical stuff, in my opinion, since what I’m best doing is technical stuff. Yet, I wanted to at least discuss one problem I see with most non-technical ideas about how to improve Gentoo: the bug count idea.
Somehow, it seems like management-kind people like to use the bug count of something as a metric to decide whether it’s good or not. I disagree with this ferociously because it really does not say much about software, if we consider as closed bugs that have just been temporarily worked around. It would be like considering the count of lines of code as a metric to evaluate the value of software. Any half-decent software engineer know that the amount of lines of code alone is pointless, and that you should really consider the language (it really changes a lot if the lines can contain one or twenty instructions each), and the comment-to-code ratio.
If the Gentoo developers were evaluted based on how many open bugs there are or on the average time a bug is kept open, we’ll end up with an huge amount of bugs closed for “need info” or “invalid” or “works for me” (which has a vague “you suck” idea), and of course “later” which is one kind of resolution I really loathe. The problem is that sometimes, to reduce the impact of a bug on users, you should put a quick workaround, like a -j1 in the emake call, or you’ll have to wait for something else to happen (for instance you might need to use built_with_use checks until the Portage version that supports EAPI 2 is stable.
Here the problem is that the standard workflow used in Bugzilla does not suite the way Gentoo works at all. And not everybody in Gentoo follows the same policy when it comes to bugs. For instance, many people would find that any problem with the upstream code does not concern Gentoo, others feel like crashes and similar problems should be taken care of, but improvements and other requests should be sent upstream. Some people think that parallel make is a priority (I’m one of them) but others don’t care. All in all, before we decide to measure the performance of developers based on bugs, we should first come over with a strict policy on how bugs are handled.
But still, open bugs mean that we know there are problems, and might or might not mean that we’re actively working on solving them, it might well be that we’re waiting for someone else to take care of them for instance. How should we deal with this?
I sincerely think that we should see to add more states to the bugs state machine. For instance we could make so that there’s a “worked around” state, which would be used for stuff like parallel make being disabled, --as-needed turned off and similar. It would mean that the bug is still there and should be resolved, but in the mean time users shouldn’t be hitting it any longer. Furthermore, we should have a way to take a bug off our radars for a while by setting it “on hold” till another bug is solved. For instance, the new Portage goes stable so we can deal with all the bugs related to EAPI 2 features being needed.
Then we should make sure that bugs that are properly resolved, and confirmed so, are closed, so that we can be sure that they won’t come up again. Of course it can’t be the same person who has marked the bug as resolved to mark it as closed, but someone else, in the same team, or the user reporting the bug, should be able to confirm that the resolution is correct and the bug is definitely closed.
But the most important thing for me is to take away the idea that bugs are a way to measure how software sucks and consider them instead a documentation of what has yet to be done. This is probably why trackers different from Bugzilla often change the name of the entries to “tasks” or “issues”; language has a high psychological impact, and we might want to deal with that ourselves.
So, how many tasks have you completed today?
>
Read More... |
Digg This!
Although most of the software in Portage already follows the correct rule of not putting their plugins in /usr/lib or equivalent directories, during my collision detection analysis I did find a few that instead pollute the standard library path with their plugins.
In general, I see way too much pollution in the library path, and that is quite bad since the loader will have to iterate through all of it whenever it has to find a library that is not in the cache, or if environment variables are set that forces it to look for libraries in a different order. Also, the linker (which is quite slow already) needs to look there to find the libraries to link to, which means more and more work every time, to scan everything.
For this reason, the amount of files in /usr/lib that are not needed to be there should really be reduced drastically, this means for instance not installing shell scripts there, like quite a bit of software does, and especially not installing plugins, since those would be picked up by the ldconfig utility and written in a cachefile that should really be as small as possible.
It’s not just that though, there are more problems with software installing plugins, for instance I noticed that in the PAM modules directory /lib/security there is one plugin with a full-versioned name (that is with the final .0.0 which is not needed by PAM modules) and another that uses the lib prefix (also unneeded); both are likely built with libtool by somebody who knows not how to properly build plugins.
I really should be writing another further test to make sure that the packages get installed following the usual layout, I just don’t know how for now. It would be a nice way to identify “rogue” packages that install outside of the standard-defined directories.
In general, whenever you want to install plugins you should do so by putting them in a separate directory inside /usr/lib, usually pkglibdir when using automake (defaults to /usr/lib/PACKAGE_NAME). This also allows a much simpler approach to plugins loading. And if you’re not really allowing plug-ins to be added after the ones compiled at build time, then you should probably not use plugins but rather build them in.
I guess there is more work for my tinderbox now…
>
Read More... |
Digg This!
When checking today’s failure logs from the tinderbox, to check if there are packages failing with readline 6.0 beside GDB, I’ve noticed a few more failures since last time, mostly related to the new gcc 4.3.3 ebuild and the fact that -D_FORTIFY_SOURCE=2 is now enabled by default. Which by the way is what makes the whole thing too noisy for my taste .
While there are a few cases where the code is explicitly being rejected by the compiler for being wrong, most of the failures seems to be extra warnings that morph into failures because of the use of -Werror in released code. I think I talked about this kind of problem in passing in the past, but never wrote a full entry about it. I think the time has come for that.
The -Werror flag for GCC (used also by ICC and equivalent to Sun’s -errwarn=%all) is often considered useful to make sure one’s code is solid enough to build without warnings. This is good since warnings often enough result in errors, and as I noted yesterday having too many warnings can cause new ones to be ignored and thus create a domino effect to the point the whole software is screwed. But, it’s not a good idea to unconditionally set it up in the released code.
Why do I say that? Because things change, and especially, warnings are added. The fact that a new compiler is stricter and considers a particular piece of code as something to warn about is not going to change the quality of the software per-se; and while it’s true that fixing the warnings early can save from failing further down the road, users often enough just need the thing to build and work when they need it, and would prefer not to have to fix a few warnings first.
So we have two opposite considerations: enabling -Werror can allow the developers and the users interested in the total correctness of the program to identify new warnings earlier, and at the same time the remaining set of users who don’t care about correctness but just want the thing to work would like -Werror disabled. What’s the solution? First of all, learn to use particular -Werror= flags, (see this old post of mine for some information about it), and then you should make the thing optional.
See this is what makes free software quite powerful sometimes, optionality. Just add a switch, a know, a line to comment in and out, so that -Werror is used by default on developers builds but not on the normal user releases. For most non-autoconf-based build systems, -Werror is just passed along with the rest of the CFLAGS so it’s easy to deal with that, for autoconf-based systems, it’s not rare that it’s added at the end of the script, unconditionally. Why does that happen? Because passing it to the ./configure call, like any other compiler flag will almost certainly cause some autoconf checks to fail. No more no less.
True, sometimes what is warning in a version of GCC becomes error in the one after, so it’s not really a solution if the warnings are not taken care of. But that’s the very reason why GCC introduces them as warnings usually! It gives time to the developers to act on them before rejecting them out of the blue. Of course it would be nicer if GCC also added an extra specification like “this will become an error with release X.Y.Z” but still even that is often ignored so it does not really matter.
This becomes even more important for ebuild developers since having -Werror enabled does not really work well with Gentoo, since we might add new stricter GCC versions, or even one more entry in CFLAGS to enable further warnings (for instance I have -Wformat=2 -Wstrict-aliasing=2 -Wno-format-zero-length in mine), which would then cause packages to fail out of the blue. Unfortunately it seems that quite a bit of packages still use -Werror in their default build and not all Gentoo maintainers took care of removing it beforehand.
So please, don’t use -Werror in released code, make it optional, use it during development, but not in released code. And not in ebuilds either.
>
Read More... |
Digg This!
A little context for those reading me; I’m writing this post a Friday night when I was planning to meet with friends (why I didn’t is a long story and not important now). After I accepted that I wouldn’t have a friendly night I decided to finish a job-related task, but unfortunately I’ve had some issues with my system. Somehow the latest radeon driver is unstable (well it’s an experimental driver after all), and it messes up compiz; in turn after a while either X crashes or I’m forced to restart it. This wouldn’t be a problem if the emacs daemon worked as expected. Since it doesn’t, I lose my workspace, with the open files, and everything related to that. It’s obnoxious. Since this happened four times already today I decided to take the night off, but I wasn’t in the mood for playing, so I settled for watching Pirates of the Carribean 2 in Blu-Ray, and write out some notes regarding the topics I wanted to write about for quite a while.
The choice of topic was related to the actual context I’ve just written above. As I said GNU Emacs is acting badly when it comes to the daemon. While the idea of the daemon would be to share buffers (open files) between ttys, network and graphical session, and to actually allow restarting those sessions without losing your settings, your data, and your open files, it’s pretty badly implemented.
A few months ago I reported that as soon as X was killed by anything (or even closed properly), the whole emacs daemon went down. After some debugging it turned out to be a problem with the handling of message logging. When the clients closed they sent a message to be logged by the emacs daemon, but since it had no way to actually write it to a TTY session, it died. That problem have been solved.
Now the problem appear to be just the same mirrored around: after X dies, the emacs daemon process is still running, but as soon as I open a new client, it dies. I guess it’s still trying to logging. As of today the problem still happens with the CVS version.
So anyway, this reminded me of a problem I already wanted to discuss with a blog: user-tied services. Classically, you had user-level software that i s started by an user and services that are started by the init system when the system starts up. With time, software became less straightforward. We have hotplugged services, that start up when you connect hardware like, for instance, a bluetooth dongle, and we have session software that is started when you login and is stopped once you exit.
Now, most of the session-related software is started when you log into X, and is stopped when you exit, sometimes, though, you want processes to persist between sessions. This is the case of emacs, but also my use case for PulseAudio since I want for it to keep going from before I login to before I shut down the system straight. There are more cases of similar issues but let’s start with this for now.
So how do we handle these issues? Well for PulseAudio we have an init script for the systemwide daemon. It works, but it’s not the suggested method to handle PulseAudio (on the other hand is probably the only way to have a multi-user setup with more than one user able to play sound, but that’s for another day too). For emacs, we have a a multiplexed init script that provides one service per user; a similar method is available for other service. Indeed, in my list of things to work on regarding PulseAudio there is to add a similar multiplexed init script to run per-user sessions of PulseAudio without using the system wide instance (should solve a bit of problems).
So the issue should be solved with he multiplexed per-user init script, no? Unfortunately, no. To be able to add the init scripts to the runlevels to be started, you need to have the root privileges. To start, stop and restart the services, you also need root privileges. While you can use sudo to allow users to run the start/stop commands to the init script, this is far from being the proper solution.
What I’d like to have one day is a way to have user-linked services, in three type of runlevels: always running (start when the machine starts up, stop when the system shuts down), after the first login (the services are started at the first login and never stop till shutdown), and while logged in (the services start at the first login, and stop when the last session logs out).
At that point it would be then possible to provide init scripts capable of per-user multiplexing for stuff like mpd too, so that users could actually have the flexibility f choosing how to run any software in the tree.
Unfortunately I don’t have any idea on how to implement this right now, but I guess I could just throw this in for the Summer of Code ideas.
>
Read More... |
Digg This!
That’s a very common question as of lately, and somehow I feel like most people who haven’t dealt with ALSA in the past would find it very difficult to properly answer to it. Even myself I would have ignored one particular issue till last night, when I hit another reason why I want to keep PulseAudio as my main and only audio system as soon as possible, reducing direct ALSA access.
With modern systems you mostly get onboard sound cards, rather than PCI cards and similar, unless you’re somewhat crazy and want a proaudio card (like I did) or a Creative card (not yet sure what’s the fuzz about that). To be honest, my motherboard really does have a limited soundcard so I would have needed a PCI card anyway to get digital S/PDIF output, and I want S/PDIF because the room has too much electric noise to the point I can hear it on the speakers.
Somehow, ALSA seem to hide to most users that there are indeed a lot of limitations with the HDA hardware. This because the idea for the HDA was probably to shift as much work as possible into software space rather than doing so in hardware; this is why you need a driver fix for the headphones jack to work properly most of the time. While this can be seen as a way to produce cheap cards, it has to be said that software always had to compensate for hardware defects since it’s more flexible. And having most of the processing done in software means that you can actually fix it if there is a bug, rather than having to find a workaround if anything.
So, no hardware mixing, and ALSA has to use dmix, no hardware volume handling, and ALSA has to use softvol, no hardware resampling, and ALSA has its own resamplers, and so on so forth. But while ALSA has to cope with doing these things in the same process that are doing the audio output, and thus is not so good at coping with different processes accessing the audio device with different parameters.
Having a separate process doing the elaboration does not mean that you add more work to the system, you can actually make it do less work if, for instance, you ask that process to play a (cached) sound rather than having it opened, converted, and played back.
At the same time, the fact that PulseAudio handles mixing, volume and resampling in software does not mean that it adds to the work done by the software for most cards, since the most common cards nowadays, the HDA-based ones, already do all that in software, just you don’t see that explicitly because ALSA do them in process.
In my opinion, ALSA is really doing way too much as a library, since each time you open a device it has to parse a number of configuration files, to identify which definition to use (and even then, by default they are far from perfect, for instance for the ICE1712-based cards, which depending on the way they are wired may have two, four, six or eight channel outputs, the definition only suits the 8-channel model, and does not make sense with the lower end cards). And once it found the definition it has to initialise a number of plugins, internal or external, to perform the software functions.
And this is all without putting in the mix the LISP interpreter that alsa-lib ships with (I’d leave to Lennart to explain that one).
So if you think that PulseAudio is an over-engineered piece of software that performs functions in software that should be done in hardware, you better be a FreeBSD user. At least there the OSS subsystem does not try to do all the things that ALSA does (and on the other hand FreeBSD is trying to go the Microsoft way for the HDA cards, implementing the UAA specifications rather than having a huge table of quirks; as far as I can see that would be quite useful, if it’s going to work that is).
But even in that case you should probably find PulseAudio a good thing since it tries not to be Linux-specific and thus would allow for performing all the needed functions with a single cross-platform software without having to reimplement it in platform-specific systems like OSS or ALSA. That was my main reason to look into PulseAudio when I was working on Gentoo/FreeBSD.
For everyone who cares about users, let’s try to work all together to make PulseAudio better rather than attacking Lennart for trying to do the right thing, okay?
>
Read More... |
Digg This!
When I’ve read some rants about Firefox I thought they were a little bit too much. Now, I start to wonder if they were quite to the point instead. But before I start I have to say I haven’t tried contacting anybody yet, neither from the Gentoo Mozilla team not upstream. And I’m sure the Gentoo Mozilla team are doing their best to make sure that they can provide a working Firefox still following upstream guidelines on trademarks.
This actually sprouted from my previous work inspecting library paths I went to check which libraries for firefox-bin were loaded from the system library directory, and noticed one curious thing: /usr/lib/libsqlite3.so was being loaded. What’s the problem? The problem is that I knew that xulrunner (at least built from sources) bundles its own copy of SQLite3, so I wondered if they used the system copy for the binary package. Funnily enough, they really don’t:
yamato link-collisions # ldd /opt/firefox/firefox-bin | grep sqlite3
libsqlite3.so => /opt/firefox/libsqlite3.so (0xf67e7000)
libsqlite3.so.0 => /usr/lib/libsqlite3.so.0 (0xf621e000)
yamato link-collisions # lddtree.sh /opt/firefox/firefox-bin | grep sqlite3 -B1
libxul.so => /opt/firefox/libxul.so
libsqlite3.so => /opt/firefox/libsqlite3.so
--
libsoftokn3.so => /usr/lib/nss/libsoftokn3.so
libsqlite3.so.0 => /usr/lib/libsqlite3.so.0
(The lddtree.sh script comes from pax-utils and uses scanelf. I have a similar script in my Ruby-Elf suite implemented as a testcase, it produces the same results, basically.)
So the binary version of the package uses the system copy of NSS and thus loads the system copy of SQLite3. I haven’t gone as far as checking where the symbols were resolved, but one of the two is going to be loaded and unused, wasting memory (clean and dirty, for relocated data sections). Not nice, but one can say it’s the default binary, and has to know to adapt. In truth the problem here is that upstream didn’t use rpath, and thus the firefox-bin program does not load all its libraries from the /opt/firefox directory (since the /usr/lib/nss directory comes first). Had they built their binary with rpath set to $ORIGIN it would have loaded everything from /opt/firefox without caring about the system libraries, like it was intended to do. Interestingly enough, they do just that for Solaris, but not for Linux where they prefer fiddling with LD_LIBRARY_PATH.
Next, I checked the /usr/bin/firefox started, which I already copied on the other post:
#!/bin/sh
export LD_LIBRARY_PATH="/usr/lib64/mozilla-firefox"
exec "/usr/lib64/mozilla-firefox"/firefox "$@"
Let’s ignore the problem with the rewriting of the environment variable, which I don’t care about right now, and check what it does. It adds the /usr/lib64/mozilla-firefox directory to the list of paths to load libraries from. Since it’s setting LD_LIBRARY_PATH all the library resolutions will have to be done manually rather than using the ld.so.cache file. So I checked which libraries it loads from there:
flame@yamato ~ % LD_LIBRARY_PATH=/usr/lib64/mozilla-firefox ldd /usr/lib64/mozilla-firefox/firefox | grep mozilla-firefox
flame@yamato ~ % scanelf -E ET_DYN /usr/lib64/mozilla-firefox
TYPE FILE
ET_DYN /usr/lib64/mozilla-firefox/libjemalloc.so
(The second commands finds all the libraries in the given path, by checking for ET_DYN, dynamic ELF, files.)
Okay so there is one library, but it’s not in the NEEDED lines of the firefox executable. Indeed that library is a preloadable library with a different malloc() implementation (remember I’ve written about similar things and commented about FreeBSD solution), which means it has to be passed through LD_PRELOAD to be useful, and I can’t see that to be used at all. Indeed, if I check the loaded libraries on my firefox process I can’t find it:
flame@yamato x86 % fgrep jemalloc /proc/`pidof firefox`/smaps
flame@yamato x86 %
Let’s go step by step though, for now we can say with enough safety that the loader is overwriting LD_LIBRARY_PATH with no apparent good reason. Which libraries does the firefox executable load then?
flame@yamato ~ % LD_LIBRARY_PATH=/usr/lib64/mozilla-firefox ldd /usr/lib64/mozilla-firefox/firefox
linux-vdso.so.1 => (0x00007fffcabfd000)
libdl.so.2 => /lib/libdl.so.2 (0x00007fa5c2647000)
libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/libstdc++.so.6 (0x00007fa5c2338000)
libc.so.6 => /lib/libc.so.6 (0x00007fa5c1fc5000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa5c284b000)
libm.so.6 => /lib/libm.so.6 (0x00007fa5c1d40000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007fa5c1b28000)
flame@yamato ~ % scanelf -n /usr/lib64/mozilla-firefox/firefox
TYPE NEEDED FILE
ET_EXEC libdl.so.2,libstdc++.so.6,libc.so.6 /usr/lib64/mozilla-firefox/firefox
It can’t be right, can it? We know that Firefox loads GTK+ and a bunch of other libraries, starting with xulrunner itself, but there is no link to those. But if you know your linker you should notice a funny thing: libdl.so.2. It means the exeutable is calling into the loader at runtime, which usually means dlopen() is used. Indeed it seems like the firefox executable loads the actual browser at runtime, as you can see by checking the smaps file.
Now there are two things to say here: there is a reason why firefox would be doing that, and the reason is that calling “firefox” with it open already should actually request a new window to be opened, rather than opening a new process. So basically I expect the executable to contain a launcher, that if a copy of firefox is running already just tells that to open a new window, and otherwise loads all the libraries and stuff. It’s a good idea, from one point of view because initialising all the graphical and rendering libraries just to tell another process to open a window would be a waste of resources. On the other hand, dlopen() is not the best performing approach and also creates problem to prelink.
I have no idea why it happens, but the binary package as released by upstream provides a script that seems to be taking care of the launching, and then a firefox-bin executable that doesn’t use dlopen() to load the Gecko engine and all the graphical user interface. I would very much like to know why we don’t do the same for from-source builds, I would sincerely expect that the results would be even better when using prelink and similar.
Now, let’s return a moment to the problem of the SQLite3 loaded twice for the binary release of Firefox, surely the same wouldn’t happen for the from-source version, would it? Check it by yourself:
flame@yamato x86 % fgrep sqlite /proc/`pidof firefox`/smaps
7fea6c8c2000-7fea6c935000 r-xp 00000000 fd:08 701632 /usr/lib64/libsqlite3.so.0.8.6
7fea6c935000-7fea6cb35000 ---p 00073000 fd:08 701632 /usr/lib64/libsqlite3.so.0.8.6
7fea6cb35000-7fea6cb36000 r--p 00073000 fd:08 701632 /usr/lib64/libsqlite3.so.0.8.6
7fea6cb36000-7fea6cb38000 rw-p 00074000 fd:08 701632 /usr/lib64/libsqlite3.so.0.8.6
7fea814dc000-7fea8154f000 r-xp 00000000 fd:08 24920 /usr/lib64/xulrunner-1.9/libsqlite3.so
7fea8154f000-7fea8174f000 ---p 00073000 fd:08 24920 /usr/lib64/xulrunner-1.9/libsqlite3.so
7fea8174f000-7fea81751000 r--p 00073000 fd:08 24920 /usr/lib64/xulrunner-1.9/libsqlite3.so
7fea81751000-7fea81752000 rw-p 00075000 fd:08 24920 /usr/lib64/xulrunner-1.9/libsqlite3.so
Yes, yes it does happen. So I have a process that is loading one library for no good reason at all at runtime, and not a little one at that, when it could probably, at this point, use a single system SQLite library. I say that it could, because now I have enough evidence to support that: if the two libraries had a different ABI, depending on which one the symbols resolve to, either xulrunner or NSS would be crashing down. Since ELF uses a flat namespace, the same symbol name cannot be resolved in two different libraries, and thus one of the two libraries using them would find them in the “?rong” copy. And no, before you ask, neither use symbol versioning.
So at this point the question is: can both Firefox upstream and the Gentoo Firefox ebuild start providing something that does more than just working and actually works properly?
>
Read More... |
Digg This!
I’m not sure why that happens, but almost every time I’m working on something, I find something different that diverges my attention, providing me with more insight on what is going wrong with software all over. This is what causes me to have a TODO list that continuously grows rather than stopping for a while.
Today, while I went on working on my dependency checker script which for now does not even exist, but for which I added a few more functions to Ruby-Elf (you can check them out if you wish), I’ve started noticing some interesting problems with the library search path of my tinderbox chroot.
The problem is that I got 47 entries in the /etc/ld.so.conf file, which means 47 paths that needs to be scanned, and yet I have two entries crated by the env files as LD_LIBRARY_PATH and yet I find bundled libraries with the Portage bash trigger that my collision detection script misses. This let me understand there is more to that than I have been seeing up to now, and got me to dig deeper.
So I checked out some of the entries in the ld.so.conf file, some of them are due to multiple slot per library for different library versions, like QCA does. This is one decent way to handle the problem, by making two libraries differing just by soname in different directories, and passing the proper -L flag to get one or the other. A much saner alternative would be to ensure that two libraries with different API would get different version names, just like glib 2.0 gets a libglib-2.0.so name, but that’s a different point altogether.
Some other entries are more worrisome. For instance, the cuda packages install all their libraries in /opt/cuda/lib, but the profiler installs there also Qt4 libraries, causing them to appear on the system library list, even though with smaller priority than the copy as installed by the Qt ebuilds. Similar things happen with the binary packages of Firefox and Thunderbird, since I have the two of them together with xulrunner, adding their library path to the system library path.
Instead of doing this, there are a few packages that install the libraries outside the library path, and then use a launcher script, whose main use is setting the LD_LIBRARY_PATH variable to make sure the loader can find their libraries. This is what, for instance, /usr/bin/kurso does. Indeed if you start the kurso (closed-source) executable directly you’re presented with a failure:
# /opt/kurso/bin/kurso3
/opt/kurso/bin/kurso3: symbol lookup error: /opt/kurso/bin/kurso3: undefined symbol: initPAnsiStrings
The symbol in question is defined in the libborqt library as provided by Kurso itself, which is loaded through dlopen() (and thus does not appear in neither ldd nor scanelf -n output, just so I remember you of this problem). I could now write a bit around the amount of software written using Kylix and the amount of copies of libborqt in the system, each with its own copy of libpng and zlib, but that’s beside the point for now. The problem is that it indeed does not work by just starting it directly, and thus a wrapper file is created.
These wrappers are a necessary evil for proprietary closed-source software, but are quite obnoxious when I see them for Free Software that just gets build improperly; that’s the case both for the binary packages of Firefox and similar and the source packages, since my current /usr/bin/firefox contains this:
#!/bin/sh
export LD_LIBRARY_PATH="/usr/lib64/mozilla-firefox"
exec "/usr/lib64/mozilla-firefox"/firefox "$@"
This has a minor evil in it (it overwrites my current LD_LIBRARY_PATH settings) and a nastier one, it really isn’t much of a good idea because then also the programs it will call will get the same value, unless it’s unset later on. In general this is not a tremendously bright idea since we do have a method, the use of DT_RUNPATH in ELF files, that allows us to tell an executable to find the libraries in a directory that is not in the path. This is tremendously useful, and should really be used much more, since setting LD_LIBRARY_PATH also means skipping over the loader cache to find the libraries for all the processes involved rather than just for the (direct) dependency of a single binary. You can see that this has been a problem for Sun too .
For what it’s worth, the OpenSolaris linker, provides special padding space since two years ago to allow changing the runpath of executable files you don’t have access to (like the kurso binary above). It’s too bad that we have to deal with software lacking such padding because it would be quite useful too.
Is this all? Not really no, since you also get env.d entries for packages like quagga that seems to install some libraries in a sub-directory of the standard library directory and then adding that to the set of scanned directories. I haven’t investigated much about it yet, this blog post was intended as just a warning call, and I’ll probably try to work further on this to make it work better, but in general if your package installs internal “core” libraries, that you rightly don’t want into the standard library path (to avoid polluting the search path), you should not add it to ld.so.conf through environment files, but should rather use the -rpath option during link to tell it to find the library in a different path.
For developers, if your package does some funny thing with its libraries and you’re not sure if you’re handling it properly, may I just ask you please to tell me about it? I’d be glad to look into it, and eventually post a full explanation of how it works in the blog so that it can be used as reference for the future. I also have a few very nasty issues to blog about on Firefox and xulrunner, but I’ll save those for another time, for now.
>
Read More... |
Digg This!
One of the worst thing that can happen in somebody’s life is when your dreams are scaring you out of your own sleep. As it turns out I’m in one of those situations. A nice period of my life ended just before Christmas, and now I’m in a bit of a pinch, with a late job, and no future (stable) job in view. I’m also out of luck with publishers since the last article I submitted to LWN was not even worth a reply, it seems.
I should be at least well happy about my health, one would expect, given that I am feeling better after the surgery and I just need to visit the hospital for some check-ups now. But even that is out of schedule, since I was supposed to be in for January, and it’s middle February now. The professor I had to reach is unreachable, so I had to pass through another doctor in the staff (whom I’m very grateful to for my previous staying too!).
But as it is said in Italy “one Pope dead, a new one is made”; I admit I’m not sure what the English equivalent would be but I’d expect it to refer to kings.
I’m currently feeling in quite a bluish mood but it’s going to be just fine as soon as I get some good nights’ sleep; relaxed sleep. The problem as I said is that my own dreams, or rather the content and the characters of the dreams I’m having lately, chase me out of bed. Even though I cannot remember the dreams by themselves, the general mood follows me when I wake up and, even though they should be pleasant dreams, they upset me very much.
Luckily I learnt to fight dreams, and nightmares, since I went to the hospital. My way of keeping them away from my mind is to listen to something that turns my attention to something much different just before sleeping. Podcasts have helped a lot about that, but sometimes I need more, longer content I haven’t listened to before. This is especially true when, like right now, Bill Maher is not on HBO so I cannot listen to new Real Time’s podcast episodes. For these times I corrupt a bit of my soul and buy audiobooks from the iTunes Store, yes with the freedom-hungry DRM on.
I was thus quite pleased when an anonymous sent me The Hitchhiker’s Guide to the Galaxy CDs from BBC Radio (and I have to say I envy British people for BBC Radio 4, News Quiz is one of my favourite shows). Even though it also sprouted for me a technical problem: how to convert the CDs in a format that makes use of 100% iPod’s features using just Free Software? I’m afraid I’m unable to answer that question just yet but I hope to be able to soon. Also thanks to the (for now unknown since it hasn’t arrived yet) person who sent me “I’m Sorry I Haven’t a Clue” CDs. I’m not sure what it is but I find the British humour refreshing. Yes I know this is neither normal nor sane …
The problem is that, the way this is going I’m unable to rest, even when I sleep, and thus I cannot work for more than a few hours on Free Software without my head starting to ache. And it’s difficult to sleep in the first place. While I would like to try cutting down on coffee, it turns out that I’m quite addicted to caffeine to the point that twice already in the past three weeks, when I tried to stay a day without getting one I would get a migraine so powerful I would be unable to crawl out of bed.
Anyway so that you know, even if I haven’t blogged about it in a while, nor I have opened new bugs, the tinderbox (or tinderflame to make it distinct from Patrick’s) is still working and crunching data. The new disks do help, since there was one (I’m afraid I know which one, I’ll write about it specifically in the future) that would make the system go stuck on pdflush, which as you might guess is not the nicest of the things. Now it seems to be working better.
Anyway, if you wish I made a special list to see if I can solve my sleep deprivation (although I’m waiting already a few things I ordered myself, so I should be set for a while), but even more importantly, there are two thing I’m going to ask users and developers reading me alike.
If you’re an user, try to raise concern with upstream projects about problems like proper --as-needed usage, parallel build and similar, I know my blog isn’t exactly the nicest place to look up information from but it should have enough to go around with issues like that. Any upstream package that fixes parallel make, --as-needed or autotools by itself is one less package I’ll have to look at when I decide to push forward my agenda of having proper packages around.
If instead you’re a developer, please help me by at least reviewing what I write, correcting me if needed, and especially submitting patches to my projects if you see they are wrong or incomplete. Having people collaborate on my projects is one thing I always miss.
>
Read More... |
Digg This!
Since I have in my TODO list to work on two binutils problems (the warning on softer—as-needed and the fix for PulseAudio build), I also started wondering why I haven’t heard, or rather read, anything about the gold linker .
Saying that I’m disappointed does not really cover much of it to be honest, since I don’t really wish to switch to a linker written in C++ any time soon. But I really hoped that it would generate enough momentum to find a solution. Because, yes, the ld linker that ships with binutils is tremendously slow to link C++ code, and as Linkers & Loaders let me understand now, the problem is not just the length of the (mangled) symbol names, but also the way that templates are expanded and linked together.
But still, I think it’s really worth investigating some alternative, which in my opinion needs not to be written in C++, with all the problems related to that. Saying that the gold linker is fast just because of the language it is written is absolutely naïve, since the problems lie quite deeper than that.
The main problem is that the current ld implementation is based, like the rest of the binutils tools, upon libbfd, an abstraction that allows to support multiple binary formats, not just ELF. It basically allows to use mostly the same interface on different operating systems with different executable formats: ELF under Linux, BSD and Solaris, Mach-O under Mac OS X and PE under Windows and more. While this allows to get a much more powerful ld command, it’s actually a bit of a bottleneck.
Even though the thing is designed well enough for not crumble easily, it is probably a good area to investigate to find why it’s so slow. Having an alternative, ELF-only linker available for users, Gentoo users especially, would likely be a good test. This would follow the same thing that Apple does on OSX (GCC calls Apple’s linker) as well as Sun under Solaris with their copy of GCC.
While I’m all for generic code, sometimes you need to have specialised tools if you want to access advanced features of files, or if you want to have a fast, optimised software.
The same thing can be said for the analysis tool provided by binutils, as I’ve written in my post about elfutils the nm, readelf and objdump tools as provided by binutils, to be generic, lack some of the useful defaults and different interface that elfutils have. Which goes to show why specialised tools here could help. I know that FreeBSD was working on providing replacement for these tools, under the BSD license as their usual. While that’s certainly an important step, I don’t remember reading anything about a new linker.
As it is, I haven’t gone out of my way to see if there are already some alternative linkers that work under Linux, beside the one provided by Sun’s compiler in Sun Studio Express (which has lots of problems on its own). If there is already one we should look at how it stands for what concerns features.
What we desire from a specialised linker, beside speed, is proper support for *.gnu.hash section, --as-needed-like features, no text relocation emitted in the code (which is a problem gold used to have at least), and possibly a better support to garbage collection of unused sections that could allow using it in production code without huge impact on performance as it seems to happen with -fdata-sections and -ffunction-sections.
I’m not going to work on this, but if somebody is interested in my opinion about using, in Gentoo, any linker in particular I’d be glad to look at them, not going to spare words though, so that you know.
>
Read More... |
Digg This!
I’m hoping this post is going to be useful for all the devs and devs to be that want to be sure their ebuilds have proper runtime dependencies. It has sprouted by the fact it seems at least a few developers were oblivious of the implications of what I’m going to describe (which I described briefly on gentoo-core a few days ago, without any response).
First of all, I have to put my hands forwards and say that I’m going to focus on just the binary ELF packages, and this is far from a complete check for proper runtime dependencies. Scripting code is much more difficult to check, while Java is at least somewhat simpler thanks to the Java team’s script.
So you got a simple software that installs ELF executable fils or shared libraries, and you want to make sure all the needed dependencies are listed. The most common mistake there is to check the link chain with ldd (which is just a special way to invoke the loader, dumping out the called libraries). This would most likely show you a huge amount of false positives:
yamato ~ # ldd /usr/bin/mplayer
linux-gate.so.1 => (0xf7f8d000)
libXext.so.6 => /usr/lib/libXext.so.6 (0xf7eec000)
libX11.so.6 => /usr/lib/libX11.so.6 (0xf7dfd000)
libpthread.so.0 => /lib/libpthread.so.0 (0xf7de5000)
libXss.so.1 => /usr/lib/libXss.so.1 (0xf7de1000)
libXv.so.1 => /usr/lib/libXv.so.1 (0xf7ddb000)
libXxf86vm.so.1 => /usr/lib/libXxf86vm.so.1 (0xf7dd4000)
libvga.so.1 => /usr/lib/libvga.so.1 (0xf7d52000)
libfaac.so.0 => /usr/lib/libfaac.so.0 (0xf7d40000)
libx264.so.65 => /usr/lib/libx264.so.65 (0xf7cae000)
libmp3lame.so.0 => /usr/lib/libmp3lame.so.0 (0xf7c37000)
libncurses.so.5 => /lib/libncurses.so.5 (0xf7bf3000)
libpng12.so.0 => /usr/lib/libpng12.so.0 (0xf7bcd000)
libz.so.1 => /lib/libz.so.1 (0xf7bb9000)
libmng.so.1 => /usr/lib/libmng.so.1 (0xf7b52000)
libasound.so.2 => /usr/lib/libasound.so.2 (0xf7a9a000)
libfreetype.so.6 => /usr/lib/libfreetype.so.6 (0xf7a13000)
libfontconfig.so.1 => /usr/lib/libfontconfig.so.1 (0xf79e6000)
libmad.so.0 => /usr/lib/libmad.so.0 (0xf79cd000)
libtheora.so.0 => /usr/lib/libtheora.so.0 (0xf799b000)
libm.so.6 => /lib/libm.so.6 (0xf7975000)
libc.so.6 => /lib/libc.so.6 (0xf7832000)
libxcb-xlib.so.0 => /usr/lib/libxcb-xlib.so.0 (0xf782f000)
libxcb.so.1 => /usr/lib/libxcb.so.1 (0xf7815000)
libdl.so.2 => /lib/libdl.so.2 (0xf7810000)
/lib/ld-linux.so.2 (0xf7f71000)
libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0xf77ef000)
librt.so.1 => /lib/librt.so.1 (0xf77e6000)
libexpat.so.1 => /usr/lib/libexpat.so.1 (0xf77bf000)
libogg.so.0 => /usr/lib/libogg.so.0 (0xf77b9000)
libXau.so.6 => /usr/lib/libXau.so.6 (0xf77b4000)
libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0xf77ae000)
In this output, for instance, you can see listed the XCB libraries, and Expat, so you could assume that MPlayer depends on those. On the other hand, it really doesn’t, and they are just indirect dependencies, that the loader will have to load anyway. To avoid being fooled by that the solution would be to check the file itself for the DT_NEEDED entries in the .dynamic section of the ELF file. This can be achieved by checking the output of readelf -d or much more quickly by using scanelf -n:
yamato ~ # scanelf -n /usr/bin/mplayer
TYPE NEEDED FILE
ET_EXEC libXext.so.6,libX11.so.6,libpthread.so.0,libXss.so.1,libXv.so.1,libXxf86vm.so.1,libvga.so.1,libfaac.so.0,libx264.so.65,libmp3lame.so.0,libncurses.so.5,libpng12.so.0,libz.so.1,libmng.so.1,libasound.so.2,libfreetype.so.6,libfontconfig.so.1,libmad.so.0,libtheora.so.0,libm.so.6,libc.so.6 /usr/bin/mplayer
As you can see here MPlayer does not use either of those libraries, which means that they should not be in MPlayer’s RDEPEND. There is, though, another common mistake here. If you don’t use --as-needed (especially not forcing it), you’re going to get indirect and misguided dependencies . So you can only trust DT_NEEDED when the system has been built with --as-needed from the start. This is not always the case and thus you can get polluted dependencies. And thanks to the fact that now the linker silently ignores --as-needed on broken libraries this is likely to create a bit of stir.
One of the entries in my ever so long TODO list (explicit requests for tasks during donation helps, just so you know) is to write a ruby-elf based script that can check the dependencies without requiring the whole system to be built with --as-needed. It would probably be a lot like the script that Serkan pointed me at for Java, but for ELF files.
After you got the required dependencies are seen by the loader right, though, your task is not complete yet. A program has more dependencies that it might appear to have, since it might require data files to be opened, like icon themes and similar, but also more important dependencies in form of other programs or libraries. And that is not always too obvious. While you can check if the software is using the dlopen() interface to load dynamically further libraries, again using scanelf, that is not going to tell you much and you have to check the source code. Also the program can call another through way of the exec family of functions, or through system(). And even if your program does not call any of these functions you cannot be sure that you got the complete dependencies right without opening it
This is because libraries adds indirection to these things too. The gmodule interface in glib allows for dynamically loading plugins, and can actually load plugins you don’ t see and check, and Qt (used to) provide a QProcess class that allows to execute other software.
All in all, even for non-scripting programs, you really need to pay attention to the sources to be safe that you got your dependencies right and you should never ever rely purely on the output of a script. Which is another reason why I think that most work in Gentoo cannot be fully automated, not just yet at least. At any rate, I’m hoping to provide developers with an usable script one day soonish, at least it’ll be a step closer than it is now.
>
Read More... |
Digg This!
I’m going to get rid of Quassel in the next days unless something drastically changes, but since I really think that Sput was doing a hell of a good job, I’d like to point out what the problems are in my opinion.
There’s nothing wrong with the idea (I love it) nor with the UI (it’s not bad at all); having it be cross-platform also helps a lot. What I really feel is a problem, though, is the creeping in of dependencies in it. Which is not Sput’s fault for the most part, but it is a good example of why I think Qt and KDE development is getting farther and farther from what I liked about it in the past.
With KDE, the last straw was when I’ve noticed that to install Umbrello I had to install Akonadi, which in turn required me to install MySQL. I don’t use MySQL for myself, I used for a couple of web development jobs but I’d really like for it to stay stopped since I don’t need it on a daily basis. On the other hand I have a running PostgreSQL I use for my actual work, like the symbol collision analysis. I doubt that it would have required me to start MySQL or Akonadi to run Umbrello, but the problem was with the build system. Just like KDE guys bastardised autotools in what is one of the most overcomplex build systems that man was able to create in the KDE 3 series, they have made CMake even worse than it would be as released by Kitware (which, on the other hand, somehow seemed to make it a bit less obnoxious—not that I like it any better, but if one has the major need of building under Windows, it can be dealt with better than some custom build systems I’ve seen).
So the new KDE4 build system seems to pick up the concept of shared checks from KDE3, which basically turns down to be a huge amount of checks that are unneeded for most software but will be executed by all of it, just because trying to actually split the “modules” in per-application releases, like GNOME does already, is just too difficult for SuSE, sorry, KDE developers.
This time the dependency creep hit Quassel badly. The recent releases of Quassel added a dependency over qt-webkit to show a preview of a link when posted in IRC. While I think this is a bad idea (because, for instance, if there was a security issue in qt-webkit, it would be tremendously easy to get users to load the page), and it still has implementation issues when the link points to a big binary file rather than a webpage or an image, it can be considered an useful feature so I never complained about it.
Today after setting up the new disks the update proposed by portage contained an interesting request of installing qt-phonon. Which I don’t intend to install at all! The whole idea of having to install phonon for an application like Quassel is just out of my range of acceptable doings.
I was the first one to complain that GNOME required/requires GStreamer, but thanks to Lennart’s efforts we now have an easy way to play system sound without needing GStreamer, on the other hand, KDE is still sticking with huge amount of layers and complex engines to do the easiest of the tasks. I’m not saying that the ideas behind Solid and the like are entirely wrong, but it does feel wrong for them to be KDE-only, just like it feels wrong for other technologies to be GNOME-only. Lennart’s libcanberra shows that there is space for desktop-agnostic technologies implementing the basic features needed by all of them, it just requires work and coordination.
So now I’m starting up Quassel to check on my messages and then I’ll log it out, after installing X-Chat or something.
>
Read More... |
Digg This!
In the past years I picked up more than a couple of “battles” to improve Free Software quality all over. Some of these were controversial, like --as-needed and some of them have been just lost causes (like trying to get rid of C++ strict requirements on server systems). All of those though, were fought with the hope of improving the situation all over, and sometimes the few accomplishments were quite a satisfaction by themselves.
I always thought that my battle for --as-needed support was going to be controversial because it does make a lot of software require fixes, but strangely enough, this has been reduced a lot. Most of the newly released software works out of the box with --as-needed, although there are some interesting exceptions, like GhostScript and libvirt . On the positive exceptions, there is for instance Luis R. Rodriguez, who made a new release of crda just to apply an --as-needed fix with a failure that was introduced in the previous release. It’s very refreshing to see that nowadays maintainers of core packages like these are concerned with these issues. I’m sure that when I’ve started working on --as-needed nobody would have made a new point release just to address such an issue.
This makes it much more likely for me to work on adding the warning to the new --as-needed and even more needed for me to find why ld fails to link PulseAudio libraries even though I’d have expected him to.
Another class of changes that I’ve been working on that have shown more interest around than I would have expected is my work on cowstats which, for the sake of self-interest, formed most of the changes in the ALSA 1.0.19 release for what concerns the userland part of the packages (see my previous post on the matter).
On this case, I wish first to thank notadev for sending me Linkers and Loaders, that is going to help me improve Ruby-Elf more and more; thanks! And since I’m speaking of Ruby-Elf, I finally decided its fate: it’ll stay. My reasoning is that first of all I was finally able to get it to work with both Ruby 1.8 and 1.9 adding a single thin wrapper (that is going to be moved to Ruby Bombe once I actually finish that), and most importantly, the code is there, I don’t want to start from scratch, there is no point in that, and I think that both Ruby 1.9 and JRuby can improve from each other (the first losing the Global Interpreter Lock and the other one trying to speed up its starting time). And I could even decide to find time to write a C-based extension, as part of Ruby-Bombe, that takes care of byteswapping memory, maybe even using OpenMP .
Also, Ruby-Elf have been serving its time a lot with the collision detection script which is hard to move to something different since it really is a thin wrapper around PostgreSQL queries, and I don’t really like to deal with SQL in C. Speaking about the collision detection script, I stand by my conclusion that software sucks .
Unfortunately while there are good signs to the issue of bundled libraries, like Lennart’s concerns with the internal copies of libltdl in both PulseAudio (now fixed) and libcanberra (also staged for removal) the whole issue is not solved yet, there are still packages in the tree with a huge amount of bundled libraries, like Avidemux and Ardour, and more scream to enter (and thankfully they don’t always do). If you’d like to see the current list of collisions, I’ve uploaded the LZMA-compressed output of my script . If you want you can clone Ruby-Elf and send me patches to extend the suppression files, to remove further noise from the file.
At any rate I’m going to continue my tinderboxing efforts, while waiting for the new disks, and work on my log analyser again. The problem with that is I really am slow at writing Python code, so I guess it would be much easier if I were to reimplement the few extra functions that I’m using out of Portage’s interface in Ruby and use those, or find a way to interface with Portage’s Python interface from Ruby. This is probably a good enough reason for me to stick with Ruby, sure Python can be faster, sure I can get better multithreading with C and Vala, but it takes me much less time to write these things with Ruby than it would take me in any of the other languages. I guess it’s a problem with the mindset.
And on the other hand, if I have problems with Ruby I should probably just find time to improve the implementation; JRuby is enough evidence to show that my beef against Ruby 1.9 runtime not supporting multithreading are an implementation issue and not a language issue.
>
Read More... |
Digg This!
Following my previous blog about un-released autoconf I wanted to write a bit about an unreleased change in binutils’ ld, that Sébastien pointed me at a few days ago. Unfortunately, since things piled up, the code is now actually released, and I briefly commented about it in the as needed by default bug . The change is only in the un-keyworded snapshot of pre-2.20 binutils so it’s not released to users, which makes it worth commenting before hand anyway.
The change is as follows:
--as-needed now links in a dynamic library if it satisfies undefined symbols in regular objects, or in other dynamic libraries. In the latter case the library is not linked if it is found in a DT_NEEDED entry of one of the libraries already linked.
If you know how --as-needed works and the ELF-related terms, you should be able already to guess what it’s actually doing. If you’re not in the known with this, you should probably read again my old post about it. Basically the final result of this is that the first situation:

gets expanded in the wished linking situation:

instead of the broken one that wouldn’t work.
This is all good, you’d expect, no? I have some reserves about it. First of all, the reason for this change is to accommodate the needs of virtual implementation libraries like blas and similar. In particular the thread refers to the requirements of gsl to not link its blas implementation leaving it to the user linking the final application. While I agree that’s a desired feature, it has to be noted that all the libraries needs to keep the same ABI, otherwise just changing it on the linker call is not going to work. Which means that you can technically change the implementation by using the LD_PRELOAD environment variable to interpose the new symbols at runtime, allowing to change the possible implementation at runtime without having to relink anything.
Of course, using LD_PRELOAD is not very handy especially if you want to do it on a per-command basis or anything like that. But one could probably wonder “Why on Earth didn’t someone think of a better method for it before?” and then answer to himself after a bit of search “Someone already did!”. Indeed a very similar situation arouse on FreeBSD 5 series since there were multiple PThread implementations available. Since the ABI of the implementations is the same, they can be switched at both link editing time and at runtime linking. And to make it easier to switch it at runtime, they created a way to configure it through the /etc/libmap.conf file.
Indeed, the method to choose different implementations of PThread under FreeBSD used before libmap.conf introduction was the same that gsl wants to use. The result of which already shown that --as-needed was unusable on FreeBSD because of a similar problem: libpthread was never added to the dependencies of any library and was supposed to be linked on the final executable, that might not have any requirement for PThread by itself.
So basically the whole reasoning for softening up --as-needed is to allow working around a missing feature in the GNU runtime linker. Which is what, to me, makes it wrong. Not wrong in the sense of the wrong thing to do, but the wrong reason to do it. But it’s not that simple. Indeed this change means that there will be much less build failures with --as-needed, making it much much more likely to become part of the default options of the compiler once binutils 2.20 is released. On the other hand, I think I’ll submit a patch for ld to warn when the new code is triggered.
My reasoning is quite simple: libraries should be, as much as possible, be linked completely so that all their dependencies are well stated, especially since leaving them to the final executable to link can create a huge mess (think if the final executable is linked on a system where a different, ABI-incompatible, dependency is present, the final executable will have subtle problems running, like unexpected crashes and the like), and also, if ten executables need a single library, which forgets to state its dependency on, just as an example, libexpat, you get ten links to libexpat that needs to be re-created (while the original library will not be picked up at all by the way, so will still expect the ABI of the previous version), rather than just one.
Since indeed the softer --as-needed makes it much simpler to enable it by default, I think it’s not a good idea to revert from the new behaviour, but having a warning that would say something like “-lexpat salvaged for -lfoo” would make it easy to identify the issue and assess on a case by case basis whether this is an intended situation or just a bug. So that the latter can be corrected.
On the other hand I also have a case of failure with recursive linking, coming out of the next PulseAudio release, which I need to get fixed, hopefully before PulseAudio is released.
>
Read More... |
Digg This!
Although I’ve written about a possible cataclysm related to autoconf-2.64 to warn about the next release, it’s easy for users not to understand why the fuss about it. Especially, Gentoo users have came to know that changes, especially in autotools, tend to be quite disruptive and might actually start asking themselves why people go through that at all, considering that each time a new autoconf, automake, libtool or whatever is released we have to go round and round to fix the issues.
For this reason, I’ve decided to put the upcoming changes in the prospective of users, to let them understand that all the work that is going on for this is going to be quite useful to them, on the longish run. I say longish because the change I’ve blogged about, the handling of present-but-not-compilable headers, is something that was in the making since at least 2.59, which was out already when I joined Gentoo the first time, just to give a timeframe, this was about three years ago if not more (a quick check on the ChangeLog file dates the start of the transition to 2001-08-17, that’s almost eight years ago!).
The change was done with quite a good technical reason: just checking if an header if present is of no help, even though it’s not just a stat() call like CMake does, but it does go through the preprocessor (which in turn makes it possible to consider an header that is found but not usable as not present, like malloc.h on Mac OS X), the developers most likely want to know if the header can be used in their project. Which means it has to work with the compiler they are using, and with the options they are enabling.
Since changing the behaviour between one version and the other wouldn’t have given enough time to people to actually convert their code to check properly for header usability, for a while autoconf-generated configure files checked both that the header was present (through the preprocessor) and that it was usable (through the compiler). This, though, creates long, boring and slow configure files because it checks for more stuff than needed: for each header file in a AC_CHECK_HEADERS macro, there are two process spawned: preprocessor and compiler. As you might guess, this gets tremendously boring on projects that check just shy of an hundred header files.
While the 2.64 version still checks for both preprocessor and compiler, and warns in the case the compiler rejects an header that the preprocessor accepted and vice-versa (the compiler always winning now), hopefully we won’t have to wait till 2017 to have just one test per header in the configure output, which will finally mean shorter, slimmer, faster configure scripts.
Another interesting change in the 2.64 release which makes it particularly sweet to autotools fanatics like me is the change in AC_DEFUN_ONCE semantics that makes possible for macros to be defined that are executed exactly once. The usefulness of this is that often times you get people to write bad autoconf code, that instead of using AC_REQUIRE to make sure a particular macro has been expanded (which is usually the case for macros using $host and thus needing AC_CANONICAL_HOST), simply call it, which would mean the same check is repeated over and over (with obvious waste of time and increase in size of the generated configure file).
Thanks to the AC_DEFUN_ONCE macro, not only it’s possible to finally define macros that never gets executed more than once, but also most of the default macros that are supposed to work that way, like AC_CANONICAL_HOST and its siblings, are now defined with that, which means that hopefully even untouched configure files will be slimmed down.
Of course, this also means there are more catches with it, so I’ll have to write about them in the future. Sigh I wish I could find more time to write on the blog since there are so many important things I have to write about, but I have not enough time to expand them to a proper size since I’m currently working all day long.
>
Read More... |
Digg This!
Today I was looking around for a bug in autoconf, and I noticed one interesting bit out of the NEWS file of the current git version:
Present But Cannot Be Compiled: Autoconf will now proceed with the compiler’s result if a header is present but cannot be compiled. The warning is still printed, and you should really fix it by providing a fourth parameter to AC_CHECK_HEADER/AC_CHECK_HEADERS.
This is a tremendously useful thing to know before autoconf 2.64 is released, which is hopefully not too soon. The reason for this is that finally, after years of having that as a warning, to the point that some projects even ignored it altogether, the new autoconf will start ignoring header files that cannot be compiled, for whatever reason. This is useful since it ensures that headers are not detected that lacks proper dependencies. Unfortunately this also means that any software that currently relies on header files found without compilation will change behaviour. In particular, warnings like the following need to be addressed before the packages get to use autoconf-2.64:
checking bluetooth/bluetooth.h usability... no
checking bluetooth/bluetooth.h presence... yes
configure: WARNING: bluetooth/bluetooth.h: present but cannot be compiled
configure: WARNING: bluetooth/bluetooth.h: check for missing prerequisite headers?
configure: WARNING: bluetooth/bluetooth.h: see the Autoconf documentation
configure: WARNING: bluetooth/bluetooth.h: section "Present But Cannot Be Compiled"
configure: WARNING: bluetooth/bluetooth.h: proceeding with the preprocessor's result
configure: WARNING: bluetooth/bluetooth.h: in the future, the compiler will take precedence
checking for bluetooth/bluetooth.h... yes
Just so you know, this code comes from kdepim-3.5 build, which means that the old KDE build system is once again screwed (it’s very useful to change build system when you can’t even use one properly, the way KDE’s CMake-based build system fails at finding Ruby shows that even without autotools, upstream can make a huge mess, like depending on akonadi to be able to install Umbrello…).
The list of packages having these warnings available is actually not too long which means it should take little time to fix them, the problem is that I think I remember that previously it had a slightly different output message, which means my grep might not have hit properly, I’ll have to investigate that more deeply. Also, these are only the problems identified in Gentoo Linux, I know for sure that there are many more in Gentoo/FreeBSD, since the FreeBSD code does not express all the implicit dependencies between the headers (which is something that I sincerely can’t understand from time to time).
Unfortunately, there is no straight and final recipe to fix these problems, since they can easily be caused b y various entirely different options, for instance it can be a header that is not included when it should be (the most common case, which is what the autoconf news file reported), but it can also be a C99 header included using a compiler set to C89, like in the case above, indeed if you check the config.log file for the above build you’ll see this:
configure:34904: checking bluetooth/bluetooth.h usability
configure:34921: i686-pc-linux-gnu-gcc -c -std=iso9899:1990 -W -Wall -Wchar-subscripts -Wshadow -Wpointer-arith -Wmissing-prototypes -Wwrite-strings -D_XOPEN_SOURCE=500 -D_BSD_SOURCE -DNDEBUG -O2 -O2 -pipe -Wformat-security -Wmissing-format-attribute -DQT_THREAD_SUPPORT -D_REENTRANT conftest.c >&5
In file included from conftest.c:97:
/usr/include/bluetooth/bluetooth.h:117: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'int'
/usr/include/bluetooth/bluetooth.h:121: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void'
And the two lines contain these two definitions:
/usr/include/bluetooth/bluetooth.h:117:static inline int bacmp(const bdaddr_t *ba1, const bdaddr_t *ba2)
/usr/include/bluetooth/bluetooth.h:121:static inline void bacpy(bdaddr_t *dst, const bdaddr_t *src)
The inline keyword is not available in the standard requested by KDE’s build system for the C language, and thus using that header is not correct. But for the sake of finding a solution, it’s very well possible that most KDE packages checking for bluetooth.h could be made not check for it at all (the way KDE build system checks for stuff for single modules in the main configure file is probably one of the most obnoxious mistakes in that abomination).
Now I guess it’s time to start preparing bugs so that we are not unprepared for when this will actually be enacted!
>
Read More... |
Digg This!