Scientific Revelations

Continuing my recent theme of “CentOS is now not very useful”, I thought I should check up on how Scientific Linux version 7 was getting on… and got a bit of a surprise.

Scientific Linux was a joint project of CERN (the European atom-smashers) and Fermilab (the American equivalents) and dates back to the days of Red Hat Enterprise Linux 3 in 2004. So it’s venerable -and supported and developed by two prestigious and highly-trustworthy organisations. In the past, that’s been important -because the lack of transparency shown by the CentOS developers at times meant that there was precious little trust to be had in that particular distro!

In January this year, however, Red Hat announced it was ‘going into partnership’ with CentOS: it hired some of their lead developers, for example. If it’s possible to buy-out an open source project, Red Hat can fairly be said to have bought out CentOS. This announcement caused the CERN part of the Scientific Linux team to decide not to bother developing their own SL7 distro: instead, they said, they would adopt CentOS 7 and seek to become a ‘CentOS Special Interest Group’, giving them the chance to customise the basic CentOS installation.

Only the Fermilab developers decided to pursue their own, independent Scientific Linux distro into its version 7.x incarnation.

I’m glad someone is still pursuing a dedicated Scientific Linux distro: the competition and point of comparison with CentOS has been quite important in the past (especially when the CentOS team appeared to have completely lost the plot, around the time that version 6.0 was released). If all Enterprise Linux distros were just to be respins of CentOS (which is now, remember, essentially a not-paid-for subsidiary of Red Hat itself), we would be much the poorer for it. I don’t fancy a world in which I only have a “choice” between Red Hat and Oracle, for example.

Happily, Fermilab have indeed pushed on and a release of their version 7.0 is nearly upon us: the second beta was made available last Friday, for example. Here’s to their continued development efforts: we need them to succeed and to keep on wanting to succeed.

CentOS 7 – Not Recommended

I’ve been poking around with CentOS 7, Red Hat 7 and Oracle Enterprise Linux 7 extensively in the past couple of weeks, in the hope of producing a 7-compliant version of Asquith. For the sake of the rest of this post, lets agree to call all those distros, generically, Enterprise Linux 7.x

It’s been quite a ride, because an awful lot has been changed in the transition from Enterprise Linux 6.x to 7.x. I’ll list just some of the differences that have specifically tripped me up here:

  • How you invoke a Kickstart installation in the first place has changed
  • How you do firewalling has changed (firewalld not iptables)
  • The packages and package groups has changed
  • The way you configure iSCSI targets has changed, dramatically, and uses an interactive shell that isn’t suitable for scripting and doesn’t work with Kickstart
  • The way you disable and enable, stop and start services has changed (systemd v. init scripts)
  • The way network devices are named is now “intelligent”… and completely borks Kickstart

I’ll just explain a little more about that last point, by the way, since I’ve not mentioned it at all in any previous blogs. The gist of it is that you probably know and love your network interfaces as things like “eth0” and “eth1”, and have done for years. But they aren’t called that any more. Oh no. Instead, you get names such as “enp0s3” and “eno16777736” …and (this is the particularly cunning bit): you get different names depending on what your hardware and your BIOS is capable of.

The idea behind the change is logical and admirable in and of itself: in a server with two Ethernet cards, you were never entirely sure which one would pick up the “eth0” designation and which the “eth0”. Whereas now, the names are bus/slot dependent and are thus assigned deterministically: the card in slot 3 gets the ‘s3’ name and the one in slot 4 gets an ‘s4’ name. Simple, although you won’t know what your interface is going to be called until after the installation has itemised all your hardware and assigned the appropriate names.

The trouble is that in Kickstart, we used to do this sort of thing:

network --device eth0 --bootproto static --ip 192.168.8.101 --netmask 255.255.255.0 --nameserver 192.168.8.250 --hostname alpher.dizwell.home
network --device eth1 --bootproto static --ip 10.0.0.101 --netmask 255.255.255.0 --nameserver 192.168.8.250  hostname alpher-priv

That is to say: assign one set of network attributes to an interface called “eth0” and another to one called “eth1” …and do those assignments before the installation has completed.

So how do you write equivalent lines for an Enterprise Linux 7 Kickstart script when don’t know whether your interfaces are going to be called “enp0s3” or “eno16777736” or something completely different until after the installation has finished? You can’t. It’s just impossible to write one Kickstart script to run on any hardware now, because you won’t know what device names are ahead of time.

A case in point: my laptop and my home desktop PC, running a virtual machine in VMware Workstation, both produce Linux guests that end up with a network interface called “eno1677736”, but my work PC (also running VMware) produces guests that have a “enp0s3” network interface. One Kickstart script cannot do duty for both environments… and I have no idea what other variants my readers and would-be Asquith users might end up with, so I can’t even start taking them into account!

It’s a mess. In plain words: Enterprise Linux 7 breaks Kickstart installations.

I’ve had to work around it for now by reducing the Kickstart network line down to:

network --hostname=asquith.dizwell.home

…which doesn’t attempt to configure networking interfaces at all. Later on in my Kickstart script, in the %post section, I then cheat like mad and run this:

for f in `ls /sys/class/net`; do
  if [[ $f != "lo" ]]; then 
    cat > /etc/sysconfig/network-scripts/ifcfg-$f << EOF
DNS1=192.168.8.250
GATEWAY=
BOOTPROTO=none
DEVICE=$f
ONBOOT=yes
IPV6INIT=no
TYPE=Ethernet
IPADDR0=192.168.8.250
PREFIX0=24
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
NAME="System $f"
EOF
  fi
done

…which simply writes a network configuration file for any non-loopback interface it finds listed in the /sys/class/net directory. This is taking place after the installation has all-but completed, so by that stage the finished interface names should be available. As that stands, it will write the same configuration for both interfaces in a 2-interface server, which is obviously not right… but a little bit of bash if-then-else’ing should see that right. For Asquith itself, which only has one network interface to worry about, this code as written will work no matter what interface names your installation decides to bestow upon your guest.

But it’s not “right” and it’s certainly not elegant, and the fact can’t be dodged that in their eagerness to embrace “meaningful network interface names”, the Enterprise Linux developers broke Kickstart. Hopefully, they will un-break it before too long.

Anyway… all these changes I’ve been whinging about of late apply equally well to all three variants of Enterprise Linux that I’ve been working with. It doesn’t matter whether you use CentOS, RHEL or OEL: Kickstart (for example) will struggle to configure your network interfaces correctly in all of them.

I do, however, have a special word of opprobrium to hurl uniquely in CentOS’s direction: it’s the only one of the three distros that decided it was more important to have a word processor on your Enterprise Linux server than a working Oracle database.

Let me explain: When you register for the Red Hat trial and download the 3.4 GB Red Hat 7 ISO; or when you download the Oracle Enterprise Linux 4GB ‘V46135-01’ ISO…. in both cases you end up with a single DVD image which contains a mix of 32-bit and 64-bit libraries/packages. As you know, Oracle’s Linux installs still require a mix of 32-bit and 64-bit packages to work properly (for example, you need glibc-x86_64 and glibc-i686 before things will compile correctly during the ‘linking phase’ of the Oracle database installation). So Red Hat and OEL both provide distro installation media which can satisfy those requirements.

But if you download the 4GB CentOS 7 installation DVD, you get pure, 64-bit only packages. No .i686 software exists at all, and thus no Oracle software installations are possible with it. I asked on the CentOS forums why they decided to package things up quite differently than their upstream vendor (i.e., Red Hat) did and the only explanation someone offered was that “to make room for the LibreOffice software, they had to ditch the i686 libraries”. I’m not sure if that’s so (a DVD ISO can be 4.7GB, so there’s room for 700MB of extras on the CentOS DVD even as it stands), but if it were true, it’s a weird choice: we package our Enterprise Linux distro so that you can run a word processor, but not an Enterprise-class database. You figure the logic of that, because I can’t see any in it.

You can, of course, install Oracle database software on CentOS by the “simple” expedients of either (a) connecting your server directly to the Internet and downloading the relevant 32-bit packages with yum; or (b) downloading the CentOS “Everything” ISO, instead of the plain-vanilla “DVD ISO”. The Everything ISO is 6.7GB in size and does include the i686 software packages you’ll need. But that means it’s nearly 3GB bigger than OEL or RHEL’s Oracle-ready equivalents.

I shall be interested to see how Scientific Linux do their packaging when the time comes (they are currently stuck at version 6.5, so I don’t know when or if SL7.0 will be making an appearance).

In the meantime, I shall have no choice but to strongly recommend NOT using CentOS 7 as a platform for Oracle databases. I’ll be switching all my development work to OEL 7.x, which is Oracle-database-ready AND can be downloaded and updated for free. CentOS just seems too weirdly and obtusely different from the other Enterprise Linux distros to be worth bothering with at the moment.

Updated to add: This isn’t the only point at which CentOS diverges in annoying ways from Red Hat or Oracle’s treatment of what is supposed to be essentially the same distro: try doing an lsb_release -r to see what version your distro reports itself to be. Red Hat reports 7.0. OEL reports 7.0. CentOS, however, decides it will be clever and report 7.0.1406. Version number reporting is important to Asquith, because it determines where you’ll fetch your software from when building client servers. Having one distro decide to be different from all the others is therefore, frankly, rather annoying!

Indirection

Another in the series of the fun things that are new about CentOS 7: setting up an iSCSI target. Asquith used to do this with this sort of code:

lvcreate -l 70%VG -n asquith-dbdata vg1
echo "<target iqn.2013-08.home.dizwell:asquith.dbdata>" >> /etc/tgt/targets.conf
echo "       backing-store /dev/vg1/asquith-dbdata" >> /etc/tgt/targets.conf
echo "</target>" >> /etc/tgt/targets.conf
chkconfig tgtd on
service tgtd start

This is all good, old-fashioned stuff involving writing something to a configuration file and then starting a daemon to use it.

But this is not how we do it now. Oh no. In CentOS 7 (and its Red Hat-y and Oracle-y equivalents, of course), we use an interactive tool called targetcli to invoke an independent ‘shell’ in which these sorts of commands are issued instead:

backstores/block create name=dbdata dev=/dev/mapper/vg1-asquith--dbdata
iscsi/ create iqn.2014-07.home.dizwell:asquith.dbdata
cd iscsi/iqn.2014-07.home.dizwell:asquith.dbdata/tpg1
portals/ create
luns/ create /backstores/block/dbdata
set attribute authentication=0
set attribute generate_node_acls=1
exit

…and the final exit there takes you back to your original shell. (I probably should say that targetcli itself is not actually that new, having first been released in about 2009… but it’s now the default way of doing things in the 7 release of Enterprise Linux, and that’s new -at least, to me!).

Anyway, targetcli is definitely nice and easy and there are no services to worry about: it all just starts getting shared by magic. About the only way to check anything is actually working after you’ve issued all those targetcli commands is to do:

netstat -ant

…before and after. If port 3260 is not in use before, but is in use afterwards, then you know it’s working properly.

The real bummer about the new technique, however, is that it’s not really very scriptable. It’s an interactive tool after all, and appears to expect a system admin to be sitting at the end of the keyboard… which is not much use if you’re trying to get this all to happen as part of a Kickstart auto-build, say!

I did work out that the old trick of piping things together will help. For example, if the above commands are re-written slightly to be:

echo "cd /" | targetcli
echo "backstores/block create name=dbdata dev=/dev/mapper/vg1-asquith--dbdata" | targetcli
echo "iscsi/ create iqn.2014-07.home.dizwell:asquith.dbdata" | targetcli
echo "cd iscsi/iqn.2014-07.home.dizwell:asquith.dbdata/tpg1" | targetcli
echo "portals/ create" | targetcli
echo "luns/ create /backstores/block/dbdata" | targetcli
echo "set attribute authentication=0" | targetcli
echo "set attribute generate_node_acls=1" | targetcli

…then each of the commands in double-quotes will be passed through to the targetcli shell in turn and executed in just the same way as if you’d typed things interactively.

Excellent… but I need these commands in a slightly different context. What I’m after is for Kickstart to create a shell script that contains these commands so that it can then execute that shell script later on to automagically set up iSCSI target sharing when building a CentOS 7 Asquith Server. That means I need Kickstart to run commands which create a script which contains these commands. And at that point, I’m asking for the commands to be re-written in the following manner:

echo "echo \"cd /\" | targetcli" >> /root/iscsiconfig.sh
echo "echo \"backstores/block create name=dbdata dev=/dev/mapper/vg1-asquith--dbdata\" | targetcli" >> /root/iscsiconfig.sh
echo "echo \"iscsi/ create iqn.2014-07.home.dizwell:asquith.dbdata\" | targetcli" >> /root/iscsiconfig.sh
echo "echo \"cd iscsi/iqn.2014-07.home.dizwell:asquith.dbdata/tpg1\" | targetcli" >> /root/iscsiconfig.sh
echo "echo \"portals/ create\" | targetcli" >> /root/iscsiconfig.sh
echo "echo \"luns/ create /backstores/block/dbdata\" | targetcli" >> /root/iscsiconfig.sh
echo "echo \"set attribute authentication=0\" | targetcli" >> /root/iscsiconfig.sh
echo "echo \"set attribute generate_node_acls=1\" | targetcli" >> /root/iscsiconfig.sh

The levels of indirection here start to do my head in!

Take the first command: echo “echo \”cd /\“ | targetcli” » /root/iscsiconfig.sh

That’s my earlier echo-and-pipe-to-targetcli command wrapped up in an echo command of its own. Why? Because Kickstart will perform the ‘outer echo’ and thus write a line of text reading just echo “cd /” | targetcli to a shell script called iscsiconfig.sh. So when Kickstart later runs iscsiconfig.sh, the correct targetcli command is finally run.

So basically, we’re nesting an echo inside an echo. Kickstart will run the ‘outer echo’ so that the ‘inner echo’ command gets written to a script file. Only when that script is itself later run will the ‘inner echo’ actually be run and do anything.

Now, the way I’ve written my original inner echoes is perhaps peculiar to me: double quotes make visual sense to me and there’s no major difference in Bash between scripting with double or single quotes. Without a major functional difference, I prefer doubles. But if you start nesting your echoes, the “inner echo” has to escape its double quotes, otherwise they get taken literally, as characters, not command delimiters.

In other words, the command we eventually want to run might be:

echo "cd /" | targetcli

…where the double-quotes are not escaped, because they delimit what is to be echoed. But the command we have to run to get this command written into a shell script is:

echo "echo \"cd /\" | targetcli" >> /root/iscsiconfig.sh

So the ‘outer’ echo is now a command to write the words echo “<something>“ into a shell script -but the double-quotes used by this inner echo have to be regarded as literal text, not as parts of the outer echo command. Hence they need to be preceded by a “\” to turn them into literals (“escaped”, in the lingo).

Is your head hurting yet?! It gets worse (a bit)!

Remember that these echoed echo commands are being written into a shell script. Shell scripts need to start with a line saying where the shell executables are to be found. In the world of the Bourne Again Shell, that means starting things with a line which reads:

#!/bin/bash

Now, we want a command that echoes that into a shell script, before we later go on to execute the shell script. No problems …we just do this:

echo "#!/bin/bash" > /root/iscsiconfig.sh

This is merely as before: we’re wrapping the command we eventually want executed inside an echo statement, using double quotes as delimiters to define what gets echoed, and finishing off with a redirection out to the shell script that will contain the command.

Except that it won’t work:

[[email protected] ~]# echo "#!/bin/bash" > /root/iscsiconfig.sh
-bash: !/bin/bash": event not found

You might think that one or more of the “shebang” characters (the ”#!” at the start of the line being echoed) need escaping, so that something like

echo "\#\!/bin/bash" > /root/iscsiconfig.sh

…would do the trick. And indeed, the above escaped command will “work” instead of producing an ‘event not found’ error, but it doesn’t work very well! Just try displaying the contents of the file created with that modified command:

[[email protected] ~]# cat iscsiconfig.sh
\#\!/bin/bash

This shows you that the escape characters have actually become part of the contents of the shell script, rather than interpreted as escape characters. Present as literals, though, the escape characters mean the shell script can’t actually work when invoked. So this won’t do.

Odd though it might seem at first sight, this behaviour is actually perfectly cromulant and well-documented in the Bash manual. Specifically:

A double quote may be quoted within double quotes by preceding it with a backslash. If enabled, history expansion will be performed unless an ‘!’ appearing in double quotes is escaped using a backslash. The backslash preceding the ‘!’ is not removed.

So what’s the fix? Well, the simplest I can think of is to …er, use single quotes. You’ll find that:

echo '#!/bin/bash' > /root/iscsiconfig.sh

…works in the sense of not itself returning an error AND works in the sense that it writes the correct command into the shell script file we’re trying to create.

But now, at this point, you realise it’s a bit silly to have a mix of single and double-quotes in the same set of commands, so you think that you could go back to the original targetcli commands and re-write them using single quotes (after all, the manual makes it clear that there’s no real difference between single and double quotes except for the way four literal characters are treated).

This means your ‘doing it directly’ commands would be written as:

echo 'cd /' | targetcli
echo 'backstores/block create name=dbdata dev=/dev/mapper/vg1-asquith--dbdata' | targetcli
echo 'iscsi/ create iqn.2014-07.home.dizwell:asquith.dbdata' | targetcli
echo 'cd iscsi/iqn.2014-07.home.dizwell:asquith.dbdata/tpg1' | targetcli
echo 'portals/ create' | targetcli
echo 'luns/ create /backstores/block/dbdata' | targetcli
echo 'set attribute authentication=0' | targetcli
echo 'set attribute generate_node_acls=1' | targetcli

And they do, indeed, all work as advertised in this form. But now you want to apply that ‘layer of indirection’ that comes from the fact that you’re writing a script to write a script… so you might end up with this:

echo '#!/bin/bash' > /root/iscsiconfig.sh
echo 'echo 'cd /' | targetcli' >> /root/iscsiconfig.sh
echo 'echo 'backstores/block create name=dbdata dev=/dev/mapper/vg1-asquith--dbdata' | targetcli' >> /root/iscsiconfig.sh
echo 'echo 'iscsi/ create iqn.2014-07.home.dizwell:asquith.dbdata' | targetcli' >> /root/iscsiconfig.sh
echo 'echo 'cd iscsi/iqn.2014-07.home.dizwell:asquith.dbdata/tpg1' | targetcli' >> /root/iscsiconfig.sh
echo 'echo 'portals/ create' | targetcli' >> /root/iscsiconfig.sh
echo 'echo 'luns/ create /backstores/block/dbdata' | targetcli' >> /root/iscsiconfig.sh
echo 'echo 'set attribute authentication=0' | targetcli' >> /root/iscsiconfig.sh
echo 'echo 'set attribute generate_node_acls=1' | targetcli' >> /root/iscsiconfig.sh

I have to say, I don’t like the look of that, because gut instinct tells me double-quotes are needed there somewhere (and since, in this case, single and double quotes are interchangeable, there’s no reason why you couldn’t come up with a rule that says ‘inner echoes get doubles, outer echoes use singles’ and thus end up with something that looks a bit clearer than the above and yet remains consistent!).

But it will all work.

Which is the main thing 8-o

Though my head still hurts!

(Incidentally, you don’t need to point knowingly, laughing your head off all the while, exclaiming that I should have used a here document technique to avoid all those nested echoes in the first place. It’s true and I know (and knew) it. But lots of single-line echoes are how I start writing stuff. Only when I know it works do I start applying layers of ‘elegance’ by code tidy-ups such as that. This blog was dedicated to those whose heads hurt, not those who can write shell scripts more elegantly than me… there are far too many of those!)

CentOS 7 and Kickstart

I have long since given up hoping that things which work fine in one version of Red Hat/CentOS/etc will continue to work in the next version, unmolested. But it’s still darn’d annoying when stuff you know works fine in version X breaks in slightly mysterious ways in version X+1. Kickstart (the tool for automating CentOS/Red Hat deployments) is a case in point.

A Kickstart file which worked fine for CentOS 5.x, for example, turns out to contain entirely the wrong syntax as far as CentOS 6.x is concerned -so a re-write is required to make it functional once more.

The bad news is that this pattern continues with CentOS 7, to the point where even invoking a Kickstart installation has changed (and accordingly cost me quite a lot of time tracking down precisely where the problem is).

Short version: in the past, you invoked a Kickstart installation by pressing Tab on the first boot menu and then typing something like ks=hd:sr1/kickstart.ks. Now you do it by pressing Tab on the first boot menu and typing ks=cdrom:/dev/sr1:/kickstart.ks. It’s a subtle change but it makes all the difference!

Longer version: the change is explained pretty well in this bug report for Fedora 19 (Red Hat 7 and its clones is really a re-jigged version of Fedora 18/19, so the bug reports for the one often apply to the other).

Actually, the new syntax is clearer and more logical than the old, so I probably shouldn’t complain… but I’m in that sort of the mood at the moment, so I will :-)

There are lots of other changes, too, of course: software package groups have changed, which makes a version 6.x script useless for doing 7.x installs, just for starters.

All of which is by way of explanation for late delivery on promised RH7-ish versions of Asquith and Salisbury. They are coming (update: no they’re not!), but somewhat more slowly than I had anticipated, because of the “fun” I’ve been having with version 7’s Kickstart!

CentOS 7 arrives…

CentOS 7 has been released and is available for download from the usual places. I’ll be adapting (or trying to!) Salisbury and Asquith to work with it over the next few days and you can expect updates to those frameworks as I do.

I go for my cataract removal operation next Monday, however, so although I will at home for a couple of days afterwards (and thus have plenty of time to do the deed), I might still be bumping into things and thus not at my most efficient. Asquith/Salisbury v.2 when I can, therefore, but no promises.