OSGalaxy

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2008-11-21 23:41:00 in the "Gentoo" category
dscn1341.jpg

Well, maybe not right now ..

Since I doubt you’d be able to understand what I mean from the lolcat-speech title, let me try to summarise it in a language that nears a lot more what people actually speak (yes I know it’s still going to be too technical for some of the readers, but I guess that cannot really be helped).

Last night I couldn’t sleep, for a series of reason, not last that to make sure I could implement some stuff for my job while waiting for the actual definitive specs, I took three coffee cups, which while making me feel very nice, stops me from sleeping; not so nice when your neighbours woke you up two days in a row fighting, but I can manage.

Since at the time I was waiting for the chroot to complete some builds so I could check and submit a few more bugs (the count of “My Bugs” search on bugzilla now reaches 1200 bugs and has some reserve too), I decided to try something different. I already have been adding to my git repositories changes to a few libraries I contributed to in the past enough buildsystem so that --no-undefined is added, so last night I decided to go with doing some work on ALSA upstream repositories.

I already had checked out three of the repositories when 1.0.18 was added to the tree, since I had to fix an --as-needed issue and decided to just go on and submit all the patches to upstream for merge, this time I checked out alsa-lib, added --no-undefined and then started some analysis with ruby-elf tools cowstats and missingstatic, as well as removed a few compiler warnings, just to make sure I wouldn’t be distracted by faux problems.

The result should now be that the alsa-lib and alsa-plugins libraries have a few dirty pages less, and that the code is a bit more solid than before, with added static and const modifiers where needed. It wasn’t much of a work, but I forgot once again to add -s to the git commits so I had to rewrite history to get the Signed-off-by header to all the commits; if somebody knows how to set git per-repository to always use -s when committing, I’d be very glad.

On the other hand, this task shown me that cowstats still had and has some problems, in particular, it lacked a way to separate .data.rel from .data.rel.ro sections data. This is important to distinguish between the two since .data.rel.ro is fully prelinkable, which means after a prelink it would always be loaded from the disk without further relocation, while .data could still cause copy on write because it can be changed at runtime even after relocation.

This is even further understood by noticing that shared objects built with GCC under Linux have .data, .data and .data.rel.ro, but no .data.rel which is instead merged back into .data itself. But because of this the “real” data count in cowstats is entirely out of reality. I’ll have to rewrite that part most likely.

Anyway, I’ve done my best and hopefully tomorrow more of my patches will be merged in, so that alsa-lib’s dirty pages get reduced again. Unfortunately even after my changes, with all the plugins enabled, and in the worst case scenario, libasound.so will go on requiring more than 28KiB of dirty pages per process (minus forks and various preloads). Which is not nice at all. Prelinking can reduce the dirty pages removing these 28KiB (which are all of .data.rel.ro), and then it would just require a couple of pages.

There is one question though that now is driving me nuts: hasn’t Nokia worked on ALSA for their N800 tablets? I know alsa-plugins has a Maemo plugin (which I also cleaned up a bit last night, as it had quite a few hacks on the autotools side, and an unwarranted use of pow() instead of using left shift), but I’d expect Nokia to know better about having so many dirty pages…

Anyway, for all the remaining users, I strongly suggest you look into removing some of the plugins that ship with ALSA, like the iec958 plugin if you don’t use digital pass-through. By cutting down the amount of built-in plugins you should be able to reduce sensibly the memory that alsa and the applications using alsa would be using on your system.

I also wonder why didn’t I add an USE flag to disable the alisp feature Sorry, of course I wouldn’t be able to find an alisp USE flag if I check the output of emerge -pv alsa-utils. D’oh!. Why does ALSA need a LISP interpreter anyway?



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2008-11-20 23:57:00 in the "Gentoo" category

In my previous post I’ve noted that there are some cases where --as-needed stops a program from building even though it’s not because of an indirect link. I like to call this class of failures the “misguided link” failures.

Consider the following diagram showing such a case:

diagram showing the broken relationship between a program, libssl and libcrypto

We have a given software, linking to libssl and instead using libcrypto. This is the inverse of the indirect case I wrote about last time, but it still features a link relationship with no use relationship, which is going to be cut by --as-needed. This is one of the most interesting cases since it’s really difficult to identify without going to check either the source code or the missing symbols. It’s not limited to OpenSSL libraries, it’s actually pretty common in general, but it happens quite a lot with them since people forget that OpenSSL is more than just libssl.

So how can we identify this problem? Well the first issue here is to identify what can cause this. Let’s say we have a simple software that calculates the MD5 of its standard input, something like this:

#include <stdint.h>
#include <stdio.h>
#include <openssl/md5.h>

int main() {
  MD5_CTX md5;
  uint8_t md5digest[MD5_DIGEST_LENGTH];
  int i;

  MD5_Init(&md5);

  while(!feof(stdin)) {
    char buff[4096] = { 0, };
    size_t read = fread(buff, 1, sizeof(buff), stdin);

    MD5_Update(&md5, buff, read);
  }

  MD5_Final(&md5digest[0], &md5);

  for(i = 0; i < sizeof(md5digest); i++)
    printf("%02x", md5digest[i]);

  printf("n");
  return 0;
}

Now if we try to compile this on a system without forced --as-needed (and no --as-needed in LDFLAGS) linking it with -lssl, it will work just fine

% GCC_SPECS="" gcc md5-ssl.c -o md5-ssl -lssl
% scanelf -n md5-ssl 
 TYPE   NEEDED FILE 
ET_EXEC libssl.so.0.9.8,libc.so.6,libcrypto.so.0.9.8 md5-ssl 
% ldd md5-ssl 
    linux-vdso.so.1 =>  (0x00007fff11bfe000)
    libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0x00007f070961e000)
    libc.so.6 => /lib/libc.so.6 (0x00007f07092ab000)
    libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x00007f0708f19000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f0709870000)
    libdl.so.2 => /lib/libdl.so.2 (0x00007f0708d15000)

but if we try to compile it with forced --as-needed, or even just --as-needed in LDFLAGS, the results are quite different:

% gcc md5-ssl.c -o md5-ssl -lssl   
/tmp/.private/flame/cc8kRKqi.o: In function `main':
md5-ssl.c:(.text+0x10): undefined reference to `MD5_Init'
md5-ssl.c:(.text+0x8d): undefined reference to `MD5_Update'
md5-ssl.c:(.text+0xae): undefined reference to `MD5_Final'
collect2: ld returned 1 exit status
% GCC_SPECS="" gcc -Wl,--as-needed md5-ssl.c -o md5-ssl -lssl 
/tmp/.private/flame/ccVWCirl.o: In function `main':
md5-ssl.c:(.text+0x10): undefined reference to `MD5_Init'
md5-ssl.c:(.text+0x8d): undefined reference to `MD5_Update'
md5-ssl.c:(.text+0xae): undefined reference to `MD5_Final'
collect2: ld returned 1 exit status

A lot of people at this point would be thrown off since the library is there, after the source files (or object files), there are no commodity libraries involved, so the linking line should be correct. But instead it fails, and the problem lies in using the wrong library.

As the name tells you, libssl contains functions that are used for implementing Secure Socket Layer, while MD5 is also used for the implementation, it’s not part of the interface. And indeed, MD5 functions are not part of the library’s interface.

Now, since even the man page for these function does not tell you which library to find them in (while most Linux, *BSD and Solaris man pages tell you which library a function comes from), you have to rely on either experience, or test to find which is the correct library.

Let’s try two different approaches here, just so that people can understand how I end up debugging these things in the first place.

To begin with, let’s check whether libssl provides the symbols we’re missing, we don’t expect it to since the link failed; easy way to do this? nm and grep:

% nm -D /usr/lib/libssl.so | egrep 'MD5_(Init|Update|Final)'
%

There is no defined nor undefined symbol with those names, which means there is no MD5 interface defined nor used in that library. Which explains why the link failed. Now since we know the build works without --as-needed we check which library libssl brings in as dependencies:

% ldd /usr/lib/libssl.so
    linux-vdso.so.1 =>  (0x00007fff1dbfe000)
    libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x00007f2d1551d000)
    libc.so.6 => /lib/libc.so.6 (0x00007f2d151aa000)
    libdl.so.2 => /lib/libdl.so.2 (0x00007f2d14fa5000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f2d15b28000)

The first library is the virtual dynamic shared object of the Linux kernel, let’s ignore it; the last is the dynamic linker (or loader) itself, which we also want to ignore; we can exclude libc or otherwise the program wouldn’t have failed, since that’s always brought in. We’re left with two candidates: libdl and libcrypto. Now let’s be very dumb and ignore the name “crypto”, as well as ignoring that libdl is home of dlopen() and other known functions, and look in the two of them for the symbols:

% nm -D --defined-only /lib/libdl.so.2 | egrep 'MD5_(Init|Update|Final)'
% nm -D --defined-only /usr/lib/libcrypto.so.0.9.8 | egrep 'MD5_(Init|Update|Final)'
000000000006a5e0 T MD5_Final
000000000006a5a0 T MD5_Init
000000000006a6e0 T MD5_Update

So we found the problem, and indeed you can try yourself that requesting -lcrypto directly in the build of the program above will make it work just fine with and without --as-needed, with the added benefit that libssl is not being loaded when running the software.

Now this is a slightly boring and long approach, the alternative approach, which work just fine in Gentoo, requires just one command:

% scanelf -ql -s +MD5_Init
MD5_Init  /usr/lib64/libcrypto.so.0.9.8
MD5_Init  /usr/lib64/libgnutls-openssl.so.26.11.3

The scanelf call we have here will go searching for the correct library we need, although it might confuse you since it might report different implementation or totally unrelated libraries in case of symbol collisions (which is something I use to identify broken software by the way). Note that here I just targeted one symbol, the reason for this is that the current version of scanelf from 0.1.18 is not working properly with regex-based search; in the current CVS version you could be using scanelf -gqls 'MD5_(Init|Update|Final)', but it would just find the first anyway.

Is this easy enough to fix, in your opinion? Also consider that if software were to use pkg-config, right now, it would be listing -lssl -lcrypto -ldl, which would stop --as-needed from breaking, but is most likely going to break in the future if libssl.pc is updated to use Require.private to list libcrypto.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2008-11-19 20:31:00 in the "Gentoo" category

I hope the images appear correctly, please just leave me a comment if they don’t and I’ll try to fix them.

I think that after my writeup and Robert’s bugspree some people might have the wrong idea about the relationship between the --as-needed and the --no-undefined flags.

Let’s begin to say exactly what --no-undefined does: it makes the linker reject building targets that have undefined references that are not satisfied by any of the libraries it links to directly and indirectly. The linker already rejects this for final executables, but for a series of reason the default is to allow undefined references in shared objects. But if you have a library A that calls functions from the library B but does not directly link to it, with --no-undefined the linker will refuse to build A entirely; the default would be for A to still build, and the software SW that is going to use A to be forced to explicitly link in B. The following image shows what I mean in the form of graph:

a diagram showing the indirect linking problem

In the above image you can see the “Use relationship” and the “Link relationship” not being balanced, and here comes the problem, since --as-needed has, as task, to remove the link relationships that are not paired with an use relationship. Now just to make sure everybody is on the right page, the reason why --as-needed exists is that, thanks to the original conception of libtool and pkg-config and others, we can easily have programs whose linking diagram is something like the following:

a diagram showing a complex and overextended linking of a program

If you look carefully you can see that there are some linking branches that are not actually used at all; this is because the linker, by default, links in whatever you tell it to link, so you can easily link in a program some libraries that are never ever used, which is not only a waste of time, but it also wastes time and resources at link time since the linker may have to take care of relocations, wastes time during symbol resolution because the extra libraries need to be scanned too, and might take up resources if they have constructor functions for instance, that will cause initialisation functions to be executed, which might open datafiles, allocate data structure and so on.

Now, what we’d like would be something like the following diagram, which shows the linking reduced to the actual needed parts, still following the same rules as the original:

a diagram showing the hopefully pruned out linking of a program

This would reduce the amount of objects loaded to the minimal, needed part, so that no foreign object gets to be loaded at runtime, or linked in at build time for what matters. Unfortunately, this is a dream: the way the linker works, --as-needed produces different results than what you see here, it produces this:

a diagram showing the as-needed pruned linking of a program

I’ve changed colours for the object in this diagram: the yellow objects are the ones that lack a linking relationship, and thus won’t be loaded up, the red objects are those which are broken, since they use an object they don’t link to. You can see that there actually is an exception to that rule, since the yellow object in the middle of the graph uses the blue one at the right end of it, but don’t link directly to it; while it’s probably a good idea to make sure that all you use is also linked in, that situation is legal since the link to the highest object is available indirectly from another object.

Now, --no-undefined would have caught the two broken objects at their build-time rather than when they were to be used by another project, but as I’ll try to explain in the next days, this option is not a panacea, but it helps to identify the issues up in the stream. On the other hand, there are some situation where --as-needed finds trouble that --no-undefined wouldn’t identify early on.

Since people often tell me I write way too much in a single blog entry, I’ll try to wait adding the rest of the content till this entry is digested, the next chapter hopefully in two days, in the mean time feel free to write any question you have in the comments, so I can answer, either on the comments (as you may guess I read them but not always have time to address them directly), or in the next posts.

By the way, the speed to which I can write these article depends directly on the amount of caffeine in my bloodstream, so if you wish to have more content written faster, you can always help me by getting me some good coffee beans, I have never tried java for instance.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2008-11-18 02:41:00 in the "Gentoo" category

Even though I’m spending most of my time working on paid jobs, I’ve returned active in Gentoo, although I’m mostly doing testbuilding for --as-needed lately. Yamato, with its 8-core horsepower, is still building tree, I left it before going to my bedroom to relax a bit that it was finishing the game-arcade catgory. I’ve been committing the most trivial stuff (missing deps, broken patches, and stuff like that), and opening bugs for the rest. The result is that the “My Bugs” search on Gentoo’s bugzilla reports over one thousand bugs.

Also, tonight I spent the night fixing the patches currently in tree so that absolute paths are replaced by relative ones, since epatch now fails if you try to apply a patch that has absolute paths (because when they don’t add up you’d be introducing subtle bugs that might apply to users but not to you). The result has been an almost tree-wide commit spree that sanitised the patches so that they won’t fail to apply for users. It was a long boring and manual job but it was completed, and of that I’m happy.

But it’s not all well. as Mike (vapier) pointed out, trying to just report all failures related to -Wl,--no-undefined is going to produce a huge amount of bugspam for false positives. In cases like zsh plugins, you really can’t do much more than passing -Wl,--undefined to disable the previous option. Which makes -Wl,--no-undefined too much of an hassle to be usable in the tree. On the other hand it’s stil an useful flag to ask upstream to adopt for their ow builds, so that there are no undefined symbols. I think I’ll doublecheck all the software I contribute to to add this flag (as a special exception, this flag needs to be used only on Linux, since on other systems it might well be a problem, for instance on BeOS dynamic patching is more than likely to cause problems, and on FreeBSD the pthread functions are not usually linked in libraries).

This, and the largefile problem I wrote about brings me to wonder what we can do to improve even further the symbiosis between Gentoo and the various upstream. I’m sure there are tons of patches in the tree that hasn’t been sent, and I’m afraid that --as-needed patching will cause even more to be introduced. I wonder if there could be volunteers that spend time checking package per package the patches so that they are sent upstream, checking the Unmaintained Free Software wiki so that if a package is not maintained by upstream anymore there are references to our patches if somebody wants to pick it up.

I could be doing this myself, but it takes time, and lately I haven’t had much; I could try to push myself further but I currently don’t see much of the point since I sincerely have had very little feedback from users lately, beside the small stable group of users who I esteem very much and who’s always around when I need help. Even just a kudo on ohloh would be nice to know my work is appreciated, you know. Anyway if you’re interested in helping with submitting stuff upstream, please try to contact me, so I can see to write down upstream references in the patches that we have in tree.

Also, since I started working “full time” on --as-needed issues, I had to leave behind some things like closing some sudo bugs and some PAM issues, like the one with OpenSSH and double lastlogin handling. I hope to resume those as soon as I free some of my time from paid jobs (hopefully having some money spare to pay for Yamato, which still ain’t completely paid for, and which by this pace is going to need new disks soon enough, considering the amount of space that the distfiles archive, as well as the built chroot, take. I actually hope that Iomega will send the replacement for my UltraMax soon since I wanted to move music and video on that external drive to free up a few hundreds gigabytes from the internal drives.

Once the build is completed, I’ll also have to spend time to optimise my ruby-elf tools to identify symbol collisions. I haven’t ran the tools in almost an year, and I’m sure that with all the stuff I mergd in that chroot, I’m going to have more interesting results than the one I had before. I already started looking for internal libraries, although just on account of exported symbols, which is, as you probably know, a very bland way to identify them. A much more useful way to identify those is by looking at the DWARF debug data in the files, with utilities like pfunct from the dwarves package. I haven’t built the chroot with debug information though, since otherwise it would have required much more on-disk space, and the system is already a few tens gigabytes, without counting portage, distfiles, packages, logs or build directories.

In the mean time even with the very bland search, I found already a few packags that do bundle zlib, one of which bundles an old version of zlib (1.2.1) which as far as I remember is vulnerable to some security issues. Once again having followed policy would have avoided the problem altogether, just like SDL bundling its own versions of Xorg libraries which can now make xorg-server crash when an SDL application is executed. Isn’t it pure fun?

At any rate, for tonight I’m off, I did quite a lot already, it’s 4.30 am and I’m still not sleepy, something is not feeling right.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2008-11-17 18:10:00 in the "Gentoo" category

This post is inspired by a post of Eric Sandeen, whose blog I read last night after discovering we share an interest in making software build in parallel.

A little background for those who don’t know the issue I’m going to talk about. Classically, inode numbers and offsets were 32-bit values, but as you might guess nowadays this cannot be true, files bigger than 2GB (the highest offset that 32-bit can represent) are quite common, just think of DVD images, or even better of BluRay disks, 50GB are huge), and modern filesystems (as Eric points out: XFS, btrfs and ext4) have or might have 64-bit inode numbers. Since changing the size of types would have broken ABI compatibility, GNU libc, as well as other libraries, added support for the so-called “largefile” mode. In largefile mode, the standard file operations have types with 64-bit size. The way this is implemented is by replacing calls like open() or stat() with 64-bit variants, called open64() and stat64(). Other operating systems like FreeBSD broke ABI compatibility and only have 64-bit interfaces. On new systems that are natively 64-bit, like AMD64, the new 64-bit interface is enabled by default, so the 64-bit specific interface is not needed.

Now since the two interfaces are, well, different interfaces, the only moment when they can be switched is at build time, indeed, you need to pass some compiler defines so that it replaces he calls at buildtime, and thus make use of either the old or the new largefile interface. Most packages you can think of are probably using largefiles already, some conditionally, some unconditionally as needed, and some unconditionally, needed or not just to be safe. The problem is that not all software can deal with largefile properly as it is.

The usual way to discover a package does not support largefile is watching it fail on a >2GB file. The problem is that it’s not so nice since it means you have to fix the problem when it becomes a problem, while it would be much better to be able to identify the problem earlier, so that it can be solved before it becomes a true problem. But Eric’s post has given me an idea; I asked him for the script (which you can find attached to this post if Typo is not going to do some funny thing) and I used the same logic to identify packages using 32-bit interfaces with scanelf after portage installs it.

This is not yet a complete test since I’m forcing it to work only on x86 systems (I wanted to exclude AMD64), and it only checks stat symbols, it should check open, read write and all the other symbols too. More importantly, this is not going to work with the scanelf that you got installed by portage right now (0.1.18), since I had to fix it a bit to properly handle regexp matching and multiple symbols matching. So if you want to try this you’ll probably have to wait till I release a 0.1.19 version. At any rate, the code in the bashrc file is just the following, for now:

post_src_install() { scanelf -q -F ”#s%F” -R -s ‘-xstat,-lxstat,-__fxstat’ ”${D}” > ”${T}”/flameeyes-scanelf-stat64.log if [[ -s ”${T}”/flameeyes-scanelf-stat64.log ]]; then ewarn “Flameeyes QA Warning! Missing largefile support” cat ”${T}”/flameeyes-scanelf-stat64.log >/dev/stderr fi }

Please don’t rush submitting bugs for these things though; these are useful to know and they should probably be fixed, but please send the patches upstream rather than directly to Gentoo, for now.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2008-11-16 14:16:00 in the "Gentoo" category

Here comes another case study fof fixing parallel make issues, in this case, I’m going to talk about a parallel make issue that does not cause the build to abort, but that forces serial make even when parallel make is requested.

If you look closely at the build messages coming out of various packages you might notice from time to time the error “jobserver unavailable” coming from make. When that warning is outputted, it means that GNU make is unable to properly handle parallel builds since it does not know how to discipline the build, for instance, this comes from the build of xfsprogs:

flame@yamato xfsprogs-2.10.1 % make -j16
=== include ===
gmake[1]: warning: jobserver unavailable: using -j1.  Add `+' to parent make rule.

I have to say that GNU make here is very nice with its messages: it does not simply say that the jobserver is unavailable, it also tells you that it is going to use -j1 and that you should add a plus sign to the “parent make rule”. But I guess most people wouldn’t know how to deal with this. Let’s look deeper.

The build system of xfsprogs is based on autoconf and libtool, but it’s custom made (which by itself caused me quite a few headaches in the past and I still loathe). It is also recursive just like automake based buildsystem, but how does it recurse? The main Makefile contains this:

default: $(CONFIGURE)
ifeq ($(HAVE_BUILDDEFS), no)
        $(MAKE) -C . $@
else
        $(SUBDIRS_MAKERULE)
endif

To find SUBDIRS_MAKERULE we have to dig a lot deeper, finally we can find it in include/buildmacros:

SUBDIRS_MAKERULE = 
        @for d in $(SUBDIRS) ""; do 
                if test -d "$$d" -a ! -z "$$d"; then 
                        $(ECHO) === $$d ===; 
                        $(MAKEF) -C $$d $@ || exit $$?; 
                fi; 
        done

So it’s serialising the subdirectories build, what is the problem here? The problem is that GNU make, to implement parallel build, requires special options and descriptors to be passed over the sub-make calls, this happens automatically when make is invoked directly or through $(MAKE) but if it’s indirected through variables, then it’s not happening automatically and the developer has to tell GNU make to actually pass the options along.

Now the only problem here is to identify which is the rule that you should add + to, but this is very simple since the rule here already has a <redpre#9> symbol at its start, so just make it <redpre#10> and it’ll be done. A very big problem can arise if the rule executes something that is not make together with make (and something more than just test) since then stuff might break hugely.

At any rate, after you actually change this rule (as well as the SOURCE_MAKERULE one), xfsprogs can finally build in parallel, taking much less time than it otherwise would. Cool, isn’t it?



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego E. "Flameeyes" Pettenò) on 2008-11-15 15:22:00 in the "Gentoo" category

Since my blog post about forced --as-needed yesterday, I started building the whole portage in a chroot to see how many packages break with the forced --as-needed at build time. While build-time failure are not the only problem here (for stuff like Python, Perl and Ruby packages, the failure may well be at runtime), build-time failure are probably the most common problem with --as-needed; if we also used --no-undefined, like Robert Wohlrab is trying (maybe a little too enthusiastically), most of the failures would be at build time by the way.

As usual, testing one case also adds tests for other side cases, this time I’ve added further checks to tell me if packages install files in /usr/man, /usr/info, /usr/locale, /usr/doc, /usr/X11R6, ...and filed quite a few bugs about that already. But even without counting these problems, the run started telling me some interesting thing that I might add to the --as-needed fixing guide when I get back to work on it (maybe even this very evening).

I already knew that most of the failures I’d be receiving would be related to packages that lack a proper buildsystem and thus ignored LDFLAGS up to now (included --as-needed), but there are a few notes that really gets interested here: custom ./configure scripts seem to almost always ignore LDFLAGS and yet fail to properly link packages; a few glib-based packages fail to link to libgthread the main executable, failing to find g_thread_init(); and a lot of packages link the wrong OpenSSL library (they link libssl when they should link libcrypto).

This last note, about OpenSSL libraries, is also a very nice and useful example to show how --as-needed helps users in two main ways. Let’s go over a scenario where a package links in libssl instead of libcrypto (since libssl requires libcrypto, the symbols are satisfied, if the link is done without --as-needed).

First point: ABI changes: if libssl changed its ABI (happens sometimes, you know…), but libcrypto kept the same, the program would require an useless rebuild: it’s not affected by libssl ABI, but by libcrypto’s.

Second point, maybe even more useful at runtime: when executing the program, the libraries in NEEDED are loaded, recursively. While libssl is not too big, it would still require loading one further library that is unneeded to the program since libcrypto is the one that is actually needed. I sincerely don’t know if libssl has any constructor functions, but when this extra load happens with libraries that have many more dependencies, or constructor functions, it’s going to be a quite huge hit for no good reason.

At any rate, I wish to thank again all the people who contributed to pay for Yamato, as you can see the horsepower in it is being put to good use (although I’m still just at app-portage); and just so I don’t have to stress it to do the same work over and over again, I can tell you that some of the checks I add to my chroots for building are being added to portage, thanks to Zac, who’s always blazing fast to add deeper QA and repoman warnings, so that further mistakes don’t creep into the new code.. one hopes, at least.

Oh and before people start expecting cmake to be perfect with --as-needed, since no package using it has been reported as failing with --as-needed ... well the truth is that I can’t build any package using cmake in that chroot since it fails to build because of xmlrpc-c. And don’t get me started again on why a build system has to use XML-RPC without a chance for the user to tell “not even in your dreams”.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego "Flameeyes" Pettenò) on 2008-11-14 00:54:00 in the "Gentoo" category

Doug and Luca tonight asked me to comment about the --as-needed by default bug. As you can read there, my assessment is that the time is ready to bring on a new stage of --as-needed testing through use of forced --as-needed on specs files. If you wish to help testing with this (and I can tell you I’m working on testing this massively), you can do it by creating your own asneeded specs.

Note, Warning, End of the World Caution!

This is not a procedure for the generic users, this is for power users, who don’t mind screwing up their system even beyond repair! But it would help me (and the rest of Gentoo) testing.

First of all you have to create your own specs file, so issue this command:

# export SPECSFILE=$(dirname "$(gcc -print-libgcc-file-name)")/asneeded.specs
# export CURRPROFILE=/etc/env.d/gcc/$(gcc-config -c)
# gcc -dumpspecs | sed -e '/link:/,+1 s:--eh-frame-hdr: --as-needed:' > "$SPECSFILE"
# sed "${CURRPROFILE}" '1iGCC_SPECS='$SPECSFILE > "${CURRPROFILE}-asneeded"
# gcc-config "$(basename "${CURRPROFILE}")-asneeded"
# source /etc/profile

This will create the specs file we’re going to use, it just adds—as-needed for any linking command that is not static, then it would create a new configuration file for gcc-config pointing to that file. After this,—as-needed will be forced on. Now go rebuild your system and file bugs about the problems.

A package known to break with this is xmlrpc-c, which in turn makes cmake fail; as some people lack common sense (like adding a way to not have xmlrpc-c as a dependency because you might not want your cmake to ever submit the results of tests), this can get nasty for KDE users. But maybe, just maybe, someone can look into fixing the package at this point.

But xmlrpc-c does require some reflection on how to handle --as-needed in these cases: the problem is that the error is in one package (xmlrpc-c) and the failure in another (cmake) which makes it difficult to asses whether --as-needed break something; you might have a broken package, but nothing depending on it, and never notice. And a maintainer might not notice that his package is broken because other maintainers will get the bugs first (until they get redirected properly). Indeed it’s sub-optimal.

Interestingly enough, Mandriva actually started working on trying to resolve this problem radically, they inject -Wl,--no-undefined in their build procedures so that if a library is lacking symbols, the build dies sooner rather than later. This is fine up to a certain point, because there are times when a library does have undefined symbols, for instance if it has a recursive dependency over another library (which is the case of PulseAudio’s libpulse and libpulsecore, which I discussed with Lennart some time ago). Of course you can work this around by adding a further -Wl,--undefined that then tells ld to discard the former, but it requires more work, and more coordination with upstream.

Indeed, coordination with upstream is a crucial point here, since having to maintain --as-needed fixes in Gentoo is going to be cumbersome in the future, and even more if we start to follow Mandriva’s steps (thankfully, Mandriva is submitting the issues upstream so that they get fixed). But I admit I also haven’t been entirely straight on that; I pushed just today a series of patches to ALSA packages, one of which, to alsa-tools, was for --as-needed support (the copy we have in Portage also just works around the bug rather than fixing it). Maybe we need people that starts checking the tree for patches that haven’t been pushed upstream and tries to push them (with proper credits of course).

Another thing that we have to consider is that many times we have upstream that provide broken Makefiles, and while sending the fix upstream is still possible, fixing it in the ebuild takes more time than it is worth; this is why I want to refine and submit for approval my simple build eclass idea, that at least works as a mitigation strategy.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego "Flameeyes" Pettenò) on 2008-11-09 16:47:00 in the "Gentoo" category

Since this week I’ve been doing lots and lots of work on lscube, of which I wanted to blog but I’m afraid I won’t have much time to do that right now, I wanted to take a break this weekend, to relax, rest, and play a bit. Nothing went like I wanted, and I actually think I won’t try to take again a break like this anytime soon.

Yesterday almost went like I wanted, I worked a bit on my ruby-bombe project, removed the pending notice on some specs actually implemented some functions on it, although it’s still unable to read. I set up the SysRescueUSB key but then i had to finish a support request on a Vista laptop. Could be worse, but Vista by itself is a problem for me: I cannot connect Vista laptops to my router directly, they kill it! I have to connect them through one of my laptops (my mother’s or mine) or through Yamato. I have no idea why. Okay no problem. I finished the cleanup, and the laptop was ready to go.

Today, I was woken up very early since I was to be at home by myself, which also called for a good morning of relaxing playing with the PlayStation… yeah sure. First my brother in law dropped by with my nephew to pick up some of his tools that were left here, then I finally cleared out some tasks for a job I have to do, which called for testing, after lunch.

But while I finish set up the stuff for this job I also decide to set up another side job I was commissioned, of which I’ll try to blog in another moment since it’s really interesting to me and I might actually release it as Free Software afterward. For this job, though, I need Windows, since it’ll have to run on Vista. I have an XP license (not OEM, thus not tied to a box) but it was set up to work on the laptop, since that was the most powerful box I had at the time, and Enterprise was not VT-X capable so I wouldn’t run it there virtually; now the laptop is no longer the most powerful machine at home, and I don’t care much about playing on XP, (I only have two games running there and they should work fine on Wine nowadays), so I wanted to move it out, reclaim the free space on the laptop, and install it on VirtualBox. Unfortunately as soon as VirtualBox starts, Xorg crashes, and it does not wake up the video card when it starts back.

Two hours of fiddlings later I get to find out:

  • gnome-settings-daemon, updated today, does not always start properly; no clue why yet;
  • SDL applications were killing my Xorg, I noticed this with tuxtype before, and the same happened with VirtualBox;
  • the reason why Xorg couldn’t wake up the videocard was the framebuffer; since I don’t even use the framebuffer when SSH is not working (I love serial consoles) but rather the laptop so I can have cut and paste, I just disabled it and I live happier now;
  • the Xorg crash was in the VidMode calls;
  • the Xorg crash was already reported as FreeDesktop bug #17431
  • the Xorg crash was caused by SDL bundling its own X11 libraries !

So once again not following to the letter the policy that tells us to always use system libraries wasted more of my time than it should have. I guess this is life for you.

On the other hand, I need Mono for the commissioned work I talked above, but Mono in Portage is currently very badly present, since compnerd hasn’t bumped anything in quite some time. Today I opened a few bugs around for Gtk# for instance, and I hope I’ll be able to bump a few things in my overlay and force-feed them to Portage in a week or two.

On a different note, I start to have some hardware cravings, in particular I currently have no free USB port, no Ethernet cable and just one USB to serial adapter left, this is very suboptimal. I guess as soon as my job pays me I’ll be getting a self-powered USB hub so that at least I can replicate a few extra ports. I have a crimping tool for Ethernet cables but it does not do Cat5E cable, and I wanted to buy Cat5e if I was to buy more new cables…



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego "Flameeyes" Pettenò) on 2008-11-08 14:17:00 in the "Gentoo" category

Today I set up an USB stick to act as a SysRescueCD LiveUSB disk, since my CD-RW started failing on me (probably they are too old by now, they have more than five years, and huge amount of erasing in their count). Since I end up using SysRescue a fair amount of times, also to install Gentoo on my boxes many times, I decided to put it on a flash drive; most of the systems I maintain nowadays boot from USB sticks just fine.

My hopes actually are for having a decent Live DVD-like system to use, something with a little more software on it for when I have to actually USE foreign systems, and I already found a 4GB USB stick cheap enough for that. But in the mean time, SysRescue will do just fine. I already had the latest ISO image so I ended up just following the instructions on their site .

But there is a note that is missing there, and even on Fedora’s LiveUSB page it does not seem to put much emphasis on that issue, although it gives a way to fix it. While SysRescue seem to say that the results depend on the hardware (which I highly doubt), Fedora’s How To actually has a section “Errors and Solutions” that shows how to make a partition bootable, and even there’s a note about adding an MBR to the flash drive.

I think that’s most likely the common problem there: most USB keys lack an MBR so computers won’t boot from them by default. Add the sample MBR you find with syslinux, even in Gentoo, and you’ll be done with it.

I hope this service entry can be helpful to somebody who’s fiddling with USB sticks and Live distributions.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego "Flameeyes" Pettenò) on 2008-11-03 05:50:00 in the "Gentoo" category

My work on ruby-bombe seems to be proceeding nicely; for now it’s a very nice Ruby library that … seeks. The only thing I’ve implemented up to now, and not yet entirely, is seeking. Why this? Because it’s probably the trickiest part, since on some backends has to be emulated, on others it has to be adapted. Reading is going to be probably easier to implement.

Together with writing support for seeking, I also added tests, a huge amount of tests to make sure that my code works as I want; I have to thank Jason again, since RSpec makes it much easier to understand if the tests are being enabled, and makes it also easier to split the tests to reuse the same logic, which Test::Unit makes much more difficult.

But this is not what I’m here to talk about, I’m rather trying to flesh out, to myself and to those interested in ruby-bombe, some notes about consistency. In particular, I found that using ruby-mmap to access file is totally inconsistent with accessing them through the File class. In particular, the ruby-mmap extension always raises ArgumentError exceptions when the path used points to a file that does not exist, or when the file is unreadable. I’ll have to add quite a few more tests, for instance to make sure that when giving the path to a directory rather than a file.

As of this moment, ruby-bombe has its own exceptions to handle FileNotFound and PermissionError cases, rather than using the Errno module exceptions, since I’m not really sure if they would apply properly to network cases like HTTP and similar. For the rest, I’m trying to use all the possible Ruby exception, when they make sense (ESPIPE sincerely is not very userfriendly in my opinion).

Unfortunately, of course, I don’t know all Ruby’s facets myself, so here is why I’m blogging: I’d be very happy if somebody could help me to ensure tha tthe behaviour, the naming, and the tricks used in my library are consistent with the rest of the language. I think I was able to get a similar enough interface, even when the backend libraries are pretty inconsistent with the rest of Ruby (like ruby-mmap).

As of now, I have backends for IO streams (like pipes), in particular files (with path-based access), and sockets (both TCP and UDP), gzip-compressed files (with emulated seek), memory mapped files and string/arrays. Planned there is at least the bzip2 compressed files with no seeking, bzip2 files with seeking (using lots of memory) and http-downloadable files. The two backends I’m particularly interested in completing, for ruby-elf, are the Gzip and mmap backends. The reasons are very practical, the first is needed to reduce the space taken up by the testsuite of ruby-elf, since the files don’t have to be executed they might as well be compressed, the second is interesting if I get to implement scanelf.rb or something like that, since mapping the ELF executables and libraries into memory is most likely going to find some data already loaded in the system memory, mapped for files that are loaded for execution. This might actually improve its performances, but before judging that I’ll have to write the code.

On a different note, if you’re interested in Ruby packages, I’ve added two more ebuilds in portage: dev-ruby/uuidtools and dev-ruby/flickr, both are extensions used by Typo; while the upstream version gets them from Subversion, I’ve decided that to try reducing the amount of code I cannot easily control I’m going to pick them up from Portage, like I did for the original 4.0 version on Gentoo/FreeBSD. So enjoy them, if you need them.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego "Flameeyes" Pettenò) on 2008-11-02 13:36:00 in the "Gentoo" category

After my post about the long road to Free Java I’ve tried to inquiry everybody who might have a clue about it and found what the root cause of the problem was.

Basically, when IcedTea6 is built, it has to bootstrap itself, so it first builds itself with the JDK you provide (gcj-jdk) and then it rebuilds it with icedtea6 itself; but to rebuild itself, it sets the JAVA_HOME variable during build, hoping for ant to pick it up. But by choice of the Gentoo Java team, the JAVA_HOME variable is not supported nor respected, so the override fails, and it tries to build itself still with the previous compiler, the wrong one.

How can this work for anybody then, like Andrew said ? Well the trick is in the keyword you use. On stable systems, ant-core-1.7.0-r3 from the Java overlay is picked up, which contains an hack from Andrew (no you cannot call it “the proper way” since it does not fix the comments; if your idea of hack does not encompass doing a change and leave the opposite comments still there, then I start to worry…) to allow respecting JAVA_HOME. If you are on unstable systems, you’re going to get ant-core-1.7.1 from the main tree, that version does not have the hack, and thus will fail to build IcedTea6. I’m not sure where David Philippi have seen ant-core-1.7.1-r1 from java-overlay, since it still has the old version.

So I decided that even if it does not conform to my usual QA strictness, I wanted to try out IcedTea6. The reason for that is that I’m addicted to Yahoo Games and I haven’t found any free software package yet that supports playing online to Canasta, for instance… and I was tired to use the laptop for that since I have Yamato here all the time. I then disabled my --as-needed compiler (the build system fails when it comes to properly order the linking lines), installed the hacked ant-core, and merged icedtea6.

This time 1.3.1-r1 finally merged and I could try it out, good! about:plugins on Firefox shows me that it’s picked up, but … once I get to Yahoo games page, it does not really work: the “table” window opens, but then it does not load the applet, it goes in timeout and tries to reload; does so a few times, then Yahoo tells you to disable popup blockers.

I tried a couple more applets along the line, but it still failed quite badly, crashing a couple of time. Yeah we’re on the road to a Free Java, but we’re certainly not there yet.

On the other hand, if somebody knows how to debug problems like the ones I described above, I’d be glad to provide more information to the icedtea/openjdk developers to see that they get resolved and we can finally have a working nsplugin on AMD64.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego "Flameeyes" Pettenò) on 2008-10-30 18:25:00 in the "Gentoo" category

Since the testsuite for Ruby-Elf starts being disproportionate to the actual code that Ruby-Elf consists of, and I’m still lacking regression tests for the two scripts in it (missingstatic and cowstats), I’ve considered some time ago to support accessing ELF files compressed with either gzip or bzip2 so that the space required would be drastically reduced.

Unfortunately my idea ended up being unrealisable at least at the time since Ruby-Elf needs to seek, and neither formats allow for easy seeking around.

I started working on some generic IO access to files in a branch, but it didn’t turn out very good, and I left it behind for a while. Since now I’m at the point I really need to write the testsuites for the two scripts, I decided to revive the idea, and implement it with a system of “backends”.

I started with two simple backends: access through path (with a File instance) and access with a direct IO stream. Very easy and not really complex at all. Then I introduced a ruby-mmap backend so that the file could be mapped into memory rather than read and copied over, and this also was fine, although I had to emulate the cursor handling (seek and tell). Reading gzip compressed files was also quite easy since Ruby already provides a good interface to that. Unfortunately bzip2 support is a totally different matter, since the bzlib interface does not provide the tell and rewind methods that are needed to emulate seek for compressed files (slowly).

The problem at this point is that the code for the backends is complex on its own, and it would add over Ruby-Elf’s complexity to a point that they wouldn’t really make sense together at all. Reached this point, you know what it comes: factoring the code out.

For this reason I’m now thinking what the best course of action can be; I want to have access to possibly a lot of backends: straight files, compressed files (gzip, bzip2, lzma), archive files (tar and pax, ar, zip, ... —ruby-libarchive would help here), network files and so on. I already decided on the name: ruby-bombe (if you like history you should get the reason for the name), the problem is now taking the code out of Ruby-Elf, write ruby-bombe, adapt Ruby-Elf to the new library, and hope none of the three users of ruby-elf gets mad at me for requiring a dependency.

Tonight is going to be a long night.



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego "Flameeyes" Pettenò) on 2008-10-29 23:57:00 in the "Gentoo" category

Up to now in my series I’ve written about fixing upstream projects and I’ve given hints on how to design a properly parallel-safe build system. I haven’t written anything yet about handling the ebuilds.

While my proposal for replacement of simple makefiles would take care of most minor parallel make issues, it is limited to fixing very broken build systems since for totally non-complex software, parallel make is not an issue at all; most problems happen with complex custom rules. For all the most complex cases, you need to fix the build system appropriately, patch it down and so on.

But before you can get to that you have to take care of handling the ebuild correctly. While it’s certainly not a cool thing for owners of multicore systems to serialise a build, it’s also not good for them to have a package failing, even if it’s during a limited time before the build system is fixed. But if you add -j1 to an emake call, while you review what the problem is, there is a huge chance that the problem will remain there, hidden.

So when you have to deal with such a problem, my suggestions are these:

  • make sure that you check for if the build fails with parallel make; this involves checking it multiple time at multiple levels on a true multicore system; the reason for this is that parallel build issues are race conditions, they might require specific conditions to show up;
  • if you can identify for sure there is a parallel make issue, open a bug for it; even if it’s your package, open a bug for it, it will help you track it down; having a bug for each failure is very important since you need to know that there is a bug to ensure it’s fixed;
  • add -j1 to the ebuild’s emake call; this is a temporary measure, an hack, something that you should never rely on; but having it there will prevent build failures until you can fix the original bug;
  • write a comment referencing the bug you just open where -j1 was added, this will ensure that finding the reason for the non-parallel make will just require a bug lookup rather than searches and searches;
  • when you commit the ebuild, make sure the ChangeLog also references the bug number, make it as noticeable as possible that there is still a bug and you’re just working it around;
  • and the critical part of this: keep the bug open! Some developers hate having bugs open and would rather close everything even when they work around the bug rather than fix it, and wait for upstream or someone else to fix it properly, this is a mistake here: you have to leave the bug open.

When you add -j1 to an ebuild, you’re doing so as a contingency measure, to avoid users complaining that the package does not build; but you’re more than likely to have users complaining that the package does not use parallel make either, and they are right on that, it should. By closing the bug, you’re telling them to “go away” since the package builds, which is not what you should be doing. Instead you should acknowledge that there is a bug, and that it has to be fixed.

So if you find a bug from me about parallel make issue and I changed the ebuild to force -j1, don’t dare closing the bug or I might really get annoyed… if you don’t know how to fix it, just ask me, savvy?



> Read More... | Digg This!

published by flameeyes@gmail.com (Diego "Flameeyes" Pettenò) on 2008-10-29 10:22:00 in the "Gentoo" category

If you’ve been reading my blog for a while, you know I was really enthusiastic for Sun opening the JDK so that we could have Free Java. This should have solved both the Java trap and the Java crap (the false sense of portability among systems), and I was really looking forward for it.

When Sun finally made OpenJDK sources available, the Gentoo Java team was already on the field, and we were the first distribution packaging it, albeit in an experimental overlay that most users wouldn’t want to try. I also tried to do my best to improve the situation submitting buildsystem fixes to Sun itself to get it to build on my term, which means system libraries, --as-needed support and so on. The hospital set me back so I couldn’t continue my work to have something working, so at the end I gave up. Too bad.

After I came home I discovered that the IcedTea idea seemed to work fine, and the project was really getting something done, cool! But it wasn’t ready for prime time yet, so I decided to wait; I tried getting back on track last summer, but hospital set me back again, so I decided to not stick around too much, being out of the loop.

But since I stopped using Konqueror (with the rest of KDE) and moved to Firefox I’m missing Java functionality, since I’m on AMD64 and I don’t intend to use the 32-bit Firefox builds. So I decided to check out IcedTea6, based on OpenJDK 6 (that is, the codebase of the 1.6 series of Sun JDK, which should be much more stable). IcedTea6 actually got releases out, 1.2, 1.3, 1.3.1 now. Even though Andrew seems to be optimistic, this is not working just yet.

First problem: while OpenJDK only bootstrapped with Sun JDK 1.7 betas, IcedTea6 only bootstraps with IcedTea itself or another GNU classpath based compiler, like gcj or cacao. Okay so I merged gcj-jdk and used that one; IcedTea6 fails when I force --as-needed through compiler specs. Since the build system is quite too much of a mess for me I didn’t want to try fixing that just yet, I just wanted it to work, so I disabled that and restarted build. With gcj-jdk it fails to build because of a problem with source/target settings. Okay I can still use cacao.

The first problem with cacao is that gnu-classpath does not build with the nsplugin USE flag enabled since it does not recognise xulrunner-1.9, there is a bug open for that but no solution just yet. I disable that one and cacao builds, although IcedTea6 fails later on with internal compiler error. Yuppie!

And this is not just the end of the Odissey; I want to get Freemind to work on Yamato since I’ve tried it out and it really seems cool, but I can get it to work only on OS X for now, since to build some of its dependencies on Gentoo I need a 1.4 JDK, but on AMD64 there is no Sun JDK 1.4, no Blackdown, just Kaffe… and Kaffe 1.1.7 (the latest is 1.1.9), which is not considered anything useful or usable (and indeed it fails to build the dependency here).

I think the road is still very long, and very tricky. And I need to get a Java minion to help me finding what the heck of the problem it is!



> Read More... | Digg This!