Backing up Google Data

So, if we accept the premise, for the moment, that it might be a good idea to stop using Google services, how do you get back the data you’ve already given them?

Surprisingly, perhaps, it’s really quite easy. Your main port-of-call will be https://www.google.com/takeout, which allows you to package your data up from its various services and download them as a simple zip file. The download of any photos uploaded to Google’s Picasa servers was swift and pain-free, for example.

But there’s a huge gap in Takeout’s offerings: the email! There’s no easy way to get all your gmail downloaded using Google’s own tools.

A cross-platform, comprehensive and zero-cost tool to achieve that is, however, available as a simple software download. Point it at your email account and a local folder in which it can store its work and pretty soon you’ll have a bazillion “.eml” files, each one representing a separate email. Handily, it doesn’t just restrict itself to your inbox: anything in your sent mail folder is backed up, too.

Reading the exported mail is perhaps not as easy as it should be. Thunderbird will do it without fancy filters or import tools: just drag and drop the multiple EML files into a suitable folder and the thing will chug along until all files are across.

Once they’re in Thunderbird’s format, it’s trivial to get them into something like Evolution. The tricky bit is simply finding Thunderbird’s own data store: on Windows 7, it’s C:\Users\<username>\AppData\Roaming\Thunderbird\Profiles\<some random number>.default\Mail\Local Folders.

Somewhere in there should be the folder representing all your gmail conversions -mine was a single 156MB file. You then just copy that in its entirety to a place where Evolution can make use of it. Run Evolution’s import wizard, point it at the Gmail file, and it should correctly interepret its results, importing them to a directory of your choosing.

And if you’re like me, you’ll suddenly realise that you replied to people in 2006 with somewhat less than subtlety and pleasantness. Belated apologies if you were one of those recipients.

There are other ways to extract your Google email -native Linux ones, for example- but none work so simply and as comprehensively as Gmail-Backup. So, for now, I’ll leave it there -all my images, documents, and emails, sent and received, dating back to 2003 now safely stored on my own servers once more.

It’s kind of liberating… and it allows me, finally, to do this:

Life without Google

I’ve finally come to the conclusion that Google, try as it might to ‘do no evil’, has been progressively falling into the monopolist’s trap of doing whatever the hell it feels like doing. It’s latest arbitrary change of terms and conditions, in which it reserves to itself the right to “combine information you’ve provided from one service with information from other services”, is the last straw for me.

It means, essentially, that every Google service will track you and keep a history of what what you’re typing whenever you use one of them, sharing the information (for Google’s monetary reward) with all the others. If you’re logged into Gmail, every search on google.com you perform in a different browser tab will be attributed back to your Gmail account. If you just want to upload some photos to Picasa, tough: that has to be associated with your Google+ account… and so on.

Already, Google have built up a pretty accurate picture of me:

The new terms and conditions can only mean more information will be fed into producing this sort of thing, whether I really like it or not. (Check your own profile out by logging into, say, Gmail and then visiting this site).

Well, I’ve had enough of this. I “repatriated” my email server to my own domain’s server a few weeks ago: where before everything went via Gmail’s servers, lately I’ve only used Gmail to read the contents of my own pop3 servers. Its spam filters are excellent, so there was method in that particular bit of round-about madness. But no more even of this minimal Gmail involvement: as of yesterday, I now read my emails in Evolution directly, relying on my email server’s Spam Assassin and the Evolution client-side junk filter.

The Google Chrome web browser is also being de-installed from all my PCs. In its place: Opera. That used to be a bit of a risky choice, back in the day. But nearly all websites are HTML-5 compliant these days -or getting there- and so they generally work pretty much identically across all browsers. I don’t get all the extensions that you can plug into Firefox, it’s true. But I do get adblock and script blockers if I want them, which suits me. As a bonus, I get Opera’s simple way of synchronising bookmarks between different PCs (I have never quite understood why Firefox’s should be so complicated!)

What about the biggie? Search, that is. Well, I’ve switched to using the rather ridiculously-named DuckDuckGo. It’s an incredibly clean interface (Google used to have one of them, if you remember, before it got greedy) and the search results seem fine for me. What about the convenience of just typing in a search term in the browser’s search panel or even in its main address bar? Easy: Opera comes with a DuckDuckGo selector for the search panel. Just click Opera > Settings > Preferences > Search then double-click the entry for DuckDuckGo, click the [Details] button and switch on the options to make DuckDuckGo your default search and Speed Dial engine.

(If you had decided to use Firefox as your main browser instead, just add yourself a DuckDuckGo Search Extension. That gets the Firefox search panel going to ddg by default, but to get the address bar doing the same thing, you need to open a new tab, type about:config, agree that you’ll be careful, find the keyword.url setting and alter it to read http://duckduckgo.com/?q= …problem solved.)

What else? Ah yes… Picasa. That’s tricky. Flickr is the obvious free photo-hosting replacement here, but it’s part of the Yahoo! empire -and I regard them as not much different from Google in their desire to insert their tentacles everywhere. They’re just not quite as good as Google at doing it! So, I’d prefer to give Flickr the flick. There’s always photoshop.com, of course: you’d expect Adobe to know how to handle photography! It does mean having to use Flash in your browser, though, which I’d prefer not to have to do. And so this one is tricky: I honestly don’t have a definitive answer to it as yet. Maybe I’ll have to repatriate this to my own servers, too, in the end (Gallery works quite well, for example).

Of course, my Google+ account will be going the way of the dodo in the near future. I have a Facebook account, but they’re actually much worse than Google, so I don’t exactly use it very often! Maybe I’ll just have to be antisocial for a while.

I know of no good alternatives for Google Maps or Street View (not ones that don’t involve using Microsoft’s efforts, anyway). But at least doing everything else I’ve mentioned in this post, my use of these tools won’t be attributable to me as an individual.

Paranoia, I hear you say? Yeah, probably. But you take your stand on these things as you see them. I disliked Microsoft ruling the roost a few years ago; I now run a mostly Windows-free home. Now I am nervous about Google’s ambitions and its proposed privacy infringements to achieve them; in response, I simply choose to switch off as much Google infrastructure in my life as possible. Not something everyone will do, I realise. But maybe everyone can at least think about the issues!

Minimising Evolution

You want to run Evolution as your email client, but you then notice it doesn’t have a ‘keep running, but minimized in the system tray, so it can keep checking for new email periodically without cluttering up my taskbar’ option.

The official line on this seems to be that this is an insane requirement and that to satisfy it would count as “abuse” of the perfect Gnome environment… but that’s just typical arrogant bluster from the Gnome and/or Evolution developers. However, because it’s the official line, there really is no native, built-in way to do this very simple deed.

Here’s one possible solution, however, which I found met my requirements without necessitating that I take a degree in advanced physics to make it happen.

  • Right-click the tarball and extract
  • As root, go the the alltray-0.70 directory and issue three commands, in sequence:
./configure
make
make install
  • Alter the launcher for Evolution (on the top panel, for example) so that instead of just triggering the command “evolution” it now triggers the command “alltray evolution”.

When you now launch Evolution with that quicklaunch icon, the program will immediately minimize itself to the system tray. An envelope icon will appear there. Click it once to make Evolution appear full-screen, once more to re-minimize (or, click close). To genuinely close the program, right-click the envelope icon and click Exit. Evolution will momentarily appear full-screen before closing itself.

Happy Australia Day

January 26th, as it shall shortly be in these parts, is Australia Day.

‘Tis traditionally a day for drinking copious quantities of what the locals quaintly call ‘beer’ and burning assorted meats to buggery on the barbecue.

I have experience with this last requirement for true-blue, dinky-di Aussie-dom:

The outcome was not exactly a culinary triumph that particular year:

The good news this year for food-lovers is this:

That’s the current weather over Sydney. Blue is rain; the darker, the wetter. Yellow is sheets of it. To switch metaphors, the short story is that it’s bucketing down, and likely to stay that way for the rest of the week, apparently. As my old school fête used to say: Indoors if wet!

The upside is that I won’t have to share my food with such native inhabitants of these shores as:

…or, worse:

(for she is venomous).

But I may still have to share my cups of tea with others:

So, when I tuck into my bonza lamb roast indoors, I’ll not be battling flies or arachnids just for once and can therefore wish you all a happy Australia Day and mean it.

OpenAaarghDAP!

If there is a ghastlier bit of technology than LDAP, I don’t know what it is. The very name is redolent of nerds laughing at you: Lightweight Directory Access Protocol? Lightweight??! Very funny, guys!

There’s nothing really lightweight about it, of course. It’s a mostly-incomprehensible melange of apparently bizarre syntax and a propensity to throw a wobbly at the slightest mis-placed comma or out-of-position double quote. What’s more, it’s documentation sucks and it’s difficult even to find anyone explaining why you might want to do battle with it in the first place.

And the cause of the ‘Aaaargh!’ in this blog’s title is that it gets even better: the entire methodology, structure, call it what-you-will of the OpenLDAP implementation of LDAP changed between Red Hat Enterprise Linux 5.x and 6.x. In the old days, we configured a slapd.conf configuration file; these days, you are now supposed to poke around inside a directory called, fetchingly, /slapd.d/cn=schema. Yup, they stuck an equals sign in the directory name. Just wait until you have to edit such catchy files as olcDatabase={0}config.ldif or olcDatabase={-1}frontend.ldif. I mean, what were they on when they came up with those filenames??!

What little documentation you may find on the old slapd.conf way of doing things is mostly meaningless in this brave new world of RHEL6-style madness. So back to square one, then… which is to say, fumbling around in the dark in the hope you’ll get lucky.

Ordinarily, at this point I’d give up, consigning the whole thing to the too-hard basket. But there is actually a good reason to do battle with OpenLDAP: it makes for a free (and ultimately relatively simple) way of doing centralized TNS names resolution for Oracle databases.

Of course, the official way of doing that is to use Oracle’s own Oracle Internet Directory (OID) -and I did actually write an article about setting that up way back in about 2006. But there are lots of problems with OID. It’s a tiny part of the gigantic ‘Oracle Identity Management’ infrastructure for one thing: I defy anyone who isn’t religiously following some documentation to go to www.oracle.com and download the correct bit of OIM in order to just get the names resolution stuff working! For another thing, I never did get OID working on any version of Red Hat EL later than 4.7. I don’t know if that’s changed (I presume it has), but it was a bit of a show-stopper at the time, that’s for sure! Dare I mention at this point that RHEL 4.7 is to reach its end-of-life on February 29th 2012… er, in approximately 6 weeks, in other words?

I suppose I might also throw in the fact that I would prefer ‘completely and definitely free’ to ‘possibly, but probably not, but just maybe comes with licensing issues that are associated with a price tag’.

All of which is by way of explaining why I decided to (a) get OpenLDAP running on RHEL 5.x and 6.x -and then (b) get OID functionality working with it in both configurations. I then decided to throw in (c): a nice GUI way of working with the LDAP monster.

The OpenLDAP-as-OID-Replacement article is therefore now available.

An Artist Writes…

It is well-known in certain circles that I possess all the artistic talent of a comatose baboon who’d had his fingers trapped in a coffee grinder whilst an elephant danced a lengthy fandango on his toes. Not much, in other words.

Which does, I think, go some way to explain why, whenever I’ve tried to produce network topology diagrams in the past, no matter whether I’m using Visio (on a Windows PC at work) or Dia (on a Linux PC at home), they’ve always come out looking like a deranged two year-old had given it a whirl and then thought that ice cream sounded a better idea 48 seconds later.

I tender in evidence my latest effort:

The network itself is a thing of majesty. My feeble attempts to represent it during my hours of unemployment… not so much.

I will shortly have two new (physical) servers to accommodate in that mess somewhere, too -one of them a nice dual Xeon, 24GB affair. Time to read some more physics books (and the Dia manual, I guess).

Kickstart and DHCP

Here’s a tangled web to unpick at your leisure!

Back in December, I finally switched off my 7 year-old laptop (a sturdy Thinkpad X40) that had been doing practically nothing except running Windows XP, on which was running my ISP’s Internet Connection software, and also sharing that Internet connection with Microsoft’s built-in ICS (“Internet Connection Sharing”). Effectively, my Thinkpad had been running as a router for years. In its place, I plugged the wireless Internet USB device into a Scientific Linux server and switched on IP forwarding: a little bit of re-configuration of my various clients (laptops, netbooks, desktops) and all could still access the Internet, but now without a flaky laptop (or a flaky XP installation) in the path.

So far so good.

Cut to today, when I’m building a Centos 6.2 virtual machine, using my now-standard technique of “installation over the network” (as Network Linux Installs described here). Every time I started the installation, it bombed out with a warning that ‘Network Manager couldn’t configure the eth0 interface’. I tried Scientific Linux 6.1, which I’ve used a bazillion times in the past… same story.

So I fiddled. Since it was apparently a networking problem, I switched my virtual machine from “bridged” to “NAT” and tried Scientific Linux 6.1 again: and it worked! I didn’t pause to think why NAT worked when bridged didn’t, but simply ticked it off as ‘fixed’.

Back, therefore, to trying to build a Centos 6.2 VM, this time using the now-proven “NAT” technique. Except that this time, it didn’t work. It started to, sure enough: at least I didn’t get the ‘Network Manager can’t configure’ error again. Indeed, the installation process correctly read my kickstart file and a couple of the installation files as stored on my web server. But then it failed to read the install.img file:

 

Now that just made no sense to me: the path mentioned was absolutely, 100% guaranteed to be correct… and I checked it multiple times to be certain of that. What’s more, “install.img” is merely the third file read during the installation process (updates.img and product.img are read first). So how come the network was fine for two files but bombed out on the third?!

I spent the best part of a day checking and triple-checking my entire setup: physical host to virtual web server connectivity was fine; paths as specified in my kickstart file were fine; IP configuration as specified in the kickstart file was fine… all dead-ends.

And then, for no particular reason that I can now recall, I decided to do a quick yum install dhcp on my web server. A slightly less quick edit of /etc/dhcp/dhcpd.conf and I had myself a working DHCP server.

And guess what? The Centos 6.2 install worked perfectly!

What’s more, I then went back and tried both Centos 6.2 and Scientific Linux 6.1 installations with the “bridged” network option, and both of them worked fine too!

Apparently, therefore, DHCP is essential for network installs of Linux. Why hadn’t I ever noticed this before? Because of Microsoft, that’s why!! (They have to be responsible for all the bad things that happen, right?) The Internet Connection Sharing software my faithful Thinkpad had been using to act as a router automatically switches on both DNS and DHCP. That ancient laptop had, in fact, been this house’s DHCP server without me ever really worrying about it. When I retired it in December, therefore, I lost DHCP -and although I’d been sharing the Internet connection on the Linux box, I’d not installed DHCP anywhere to take its place. I knew all of that: but it didn’t concern me.

The reason I wasn’t concerned is that we simply don’t use DHCP in this house! (Not knowingly, anyway!!) All my servers, desktops, laptops and what-have-you are carefully assigned fixed IP addresses. I even have a spreadsheet showing which addresses have been allocated and which are free! So, since we don’t use DHCP, losing a DHCP server by retiring ICS really didn’t bother me.

But suddenly, today we have proof that DHCP was necessary after all :-(

Now, thinking about this with 20-20 hindsight , it’s all a bit obvious. How can a server you’re trying to build talk to a remote web server unless it already has acquired an IP address? Sure, the kickstart file it finds on the web server will eventually grant a proper, static IP address… but how does it get to read that file in the first place unless it already has some form of network communication with the remote server on which it’s stored?

That initial IP address is acquired easily in NAT mode: it’s the same address your physical host uses. Hence both builds in NAT mode at least started off fine, and Scientific Linux’s was completely successful. When you run in bridged network mode, though, your VMs have to acquire that initial IP address the same way any physical machine would: from a DHCP server. And since I didn’t have a DHCP server any more, not even a non-obvious one from Microsoft that’s surreptitiously switched on by enabling ICS, all my bridged builds started to fail.

The moral of the story: you need a DHCP server if you’re going to do networked/Kickstarted installs of Linux, even if you don’t intend using DHCP after the initial installation has completed.

Now, a trawl around Google tells me that this is not necessarily a cast-iron rule: apparently, there are non-DHCP ways around this. But I’ve not tried the workaround suggested -and DHCP is so trivially easy to get going anyway that it’s not something I’m desperate to avoid using.

Of course, one mystery remains: in NAT mode without a DHCP server, the Centos installation started well enough and then bombed out; in exactly the same configuration, the Scientific Linux install sailed through the whole thing without incident. It would seem as if Centos starts off by using the physical host’s IP address (thanks to NAT) but then tries to switch to acquiring its own address (which, without a DHCP server, it can’t do), whereas Scientific Linux appears to make do with its initial IP address just fine.

Why this difference in behaviour? As I say, it’s a mystery at this stage -which makes this sort of thing fun and infuriating in equal measure, I guess!

VirtualBox Installations on Scientific Linux

I’ve mentioned previously that my preferred virtualization platform is VMware Workstation. That remains true …but VirtualBox does have the distinct advantage of being free-of-charge. So unless I want to insist that my readers find AU$291.50 for VMware’s offering, it behooves me instead, from time to time, to use the virtualization platform I know we can all afford.

So here is my two-minute recipe guide to getting the latest VirtualBox product installed (for zero dollars!) on Scientific Linux 6.1. (You could always just download the relevant rpm and install it directly, but I prefer to do all my package management via yum wherever possible, so that’s the approach described here).

1. Get the gpg key

VirtualBox is supplied as a bunch of rpm packages which have been digitally signed. By checking the signature, you know no-one’s messed about with the packages before they reached you. It therefore makes sense to obtain and install the digital key needed to do that signature check. It’s easy to do, as root, at a command prompt, by issuing these two commands:

wget http://download.virtualbox.org/virtualbox/debian/oracle_vbox.asc
rpm --import oracle_vbox.asc

2. Create the yum repository

Again as root, issue this command to create a new, blank repository file:

gedit /etc/yum.repos.d/virtualbox.repo

Now paste in these contents to the empty file:

[virtualbox]
name=RHEL/CentOS-$releasever / $basearch - VirtualBox
baseurl=http://download.virtualbox.org/virtualbox/rpm/rhel/6.0/$basearch
enabled=1
gpgcheck=1
gpgkey=http://download.virtualbox.org/virtualbox/debian/oracle_vbox.asc

Save the file changes and close down gedit. Just in passing, you might note that I’ve changed this file from the version which Oracle themselves makes available on the VirtualBox website. Specifically, their baseurl uses an environment variable, called $releasever, where I have hard-coded in the number 6.0. The trouble, of course, is that if you are using the latest versions of Scientific Linux or Centos, you’ll be picking up a releasever of 6.1 or 6.2 …and no such directory exists on the VirtualBox servers. You’d need to manually check out those server’s directory structure to see if that situation changes over time.

3. Install the Software

As root once more, the following one-liner will display all the different versions of VirtualBox that are available for installation:

yum search VirtualBox

You might see this sort of output in return:

Loaded plugins: refresh-packagekit
virtualbox | 951 B 00:00
virtualbox/primary | 4.4 kB 00:00
virtualbox 17/17
============== N/S Matched: VirtualBox ==============
VirtualBox-3.2.x86_64 : Oracle VM VirtualBox
VirtualBox-4.0.x86_64 : Oracle VM VirtualBox
VirtualBox-4.1.x86_64 : Oracle VM VirtualBox

This shows that Oracle keeps a couple of older versions of the software alive and available, should you need to use them. Most people, though, will really only need the latest version, so pick that from the list and issue an appropriate “yum install” command. In my case, given the above output, this command will do the right thing:

yum -y install VirtualBox-4.1.x86_64

It’s a 58MB download or so, and as it’s installed you might see this message appear:

Running Transaction
 Installing : VirtualBox-4.1-4.1.8_75467_rhel6-1.x86_64 1/1
Creating group 'vboxusers'. VM users must be member of that group!

This gives you the clue to the last stage of the installation process…

4. Assigning Group Privileges

The software installation has created a new O/S group, called vboxusers, but it won’t have made your user account a member of that group. That needs to be fixed.

From the Gnome top panel, click System > Administration > Users and Groups. Find your user details on the Users tab and double-click the entry. Switch to the Groups tab, scroll down and check the vboxusers group name:

Click [OK] to save the change, and you’re done, though you’ll need to log off and back on before the group membership changes take practical effect.

If you prefer doing everything at the command line, just edit /etc/group (as root, of course) and add a :your-username entry to the end of the vboxusers line, which will probably be the last line of the file. In my case, for example, the line ended up reading vboxusers:x:501:hjr -which simply means that user ‘hjr’ is now a member of the vboxusers group. (Again, it’ll take a log off and fresh log on before the new group membership actually takes effect).

Either way, you’re now done and can run the VirtualBox program successfully, with the program launcher being found in Applications > System Tools.

5. USB Support

The version of VirtualBox installed by the above procedure will be unable to access USB 2.0 devices that might be plugged into your physical host. However, this shortcoming can be fixed by installing the “Oracle/VirtualBox Extension Pack”. Download it from the VirtualBox website and then just double-click the file when the download completes. You should see the following appear:

Click the [Install] button there, agree to the license, authenticate as root and all should be done in a matter of seconds. You’re now ready to build and run fully-functional virtual machines.

Look Ma! No network…

I was installing some servers in Seattle recently, when I was informed that it was not company policy to allow their servers to have Internet access, of any sort, ever. This was a bit of a blow for me, because my Gladstone script (which I use to configure production Red Hat boxes as Oracle database servers) relies on being able to do various “yum install …” commands to get the software prerequisites correct.

It was irritating, though quite understandable -and we worked around the issue by giving me temporary access to the Internet, swiftly revoked once the installs were complete. But the incident made me realise that Gladstone’s reliance on Internet connectivity was misguided.

In fact, it’s never been strictly necessary for Gladstone to have Internet access at all: every one of the software prerequisites are available on the DVD installation media for RCSL distros, so it’s always been possible to install entirely from locally-available media. I used the ‘yum install’ method simply because it was easier: for one thing, it ensured all software dependencies were satisfied automatically.

Well, that particular issue can be resolved with the use of the Palmerston script in conjunction with a Kickstart server. Of course, Palmerston itself does need to be downloaded and run to finish things off in an interactive fashion, but if you download that ahead of time and store it on your Kickstart server, you can transfer that internally, still without recourse to the wider Internet.

Kickstart + Palmerston… perfect results every time, and not an external network in sight. My man in Seattle would be happier, I think!