PostgreSQL on AWS

Excellent video specifically for PostgreSQL on AWS, however the principles are pretty universal information for running anything on AWS –


Grizzled

yow, the latest OpenStack release is looking rather slick..


VM Tad

hah, never seen these before, MS making fun of VMWare – didn’t realise Microsoft had it in them!

http://www.youtube.com/watch?v=hewedqvSWaI#!


Gen Xen

I’ve been working pretty extensively with Xen and Puppet in my new job, really loving it! I’ve been creating a whole load of Xen hosts, most of which are cloned from an initial image I built using Xen-tools. I’ve just finished a script which is over on my github page, which basically automates what was previously a manual process.

Basically, it copies your existing disk.img and swap.img, generates a new xen.cfg file based on some interactive input (desired hostname, IP, memory and number of vCPUs) plus a random Xen mac address, then mounts the disk.img file and changes some appropriate system files – /etc/hostname, hosts, and network/interfaces.

All quite simple and straight forward, but quite nice to have automated.

GenXen

Here’s the README:

GenXen #
#############################

A script for automating Xen VM deployment.

It requires that you have a base disk.img and swap.img already created.
I created mine with:
xen-create-image –pygrub –size=50Gb –swap=9Gb –vcpus=2 –memory 6Gb –dist=squeeze –dhcp –passwd –dir=/var/virt-machines –hostname=xen-squeeze-base

Fill in some of the variables at the top of GenXen.pl before running, then simply:
./GenXen.pl

The interactive part will ask for hostname, memory size, vCPUs, IP address, then generate a unique Xen mac address, and write these all to a xen config file which will be saved in /etc/xen/

It’ll copy your disk.img and swap.img to destination dir, mount the disk.img and create appropriate files for:
/etc/hostname
/etc/hosts
/etc/network/interfaces

After that you should be good to launch with:

xm create -c /etc/xen/whatever-your-hostname-is.cfg


Vagrant and Chef setup

I’ve been reading through ThoughtWorks’ latest ‘technology radar‘ which led me to look up Vagrant, one of the tools they list as worth exploring.

Vagrant is a framework for building and deploying Virtual Machine environments, using Oracle VirtualBox for the actual VMs and utilizing Chef for configuration management.

Watching through this intro video:

http://vimeo.com/9976342

i was quite intrigued as it is very similar to what i was looking to achieve earlier when i was experimenting with installing Xen and configuring with Puppet.

So here’s what I experienced during the setup of Vagrant on my Macbook – I decided to start with a simple Chef install to familiarise myself with Chef itself and it’s own requirements CouchDB, RabbitMQ and Solr, mostly by following these instructions

–CHEF INSTALL–

sudo gem install chef
sudo gem install ohai

Chef uses couchDB as it’s datastore, so we need to install it using the instructions here

brew install couchdb

The instructions I list above also contains steps to install a couchDB user and set it up as a daemon. They didn’t work for me, and after 30mins of troubleshooting, i gave up and went with the simpler option of running it under my own user – in production this will be running on a Linux server rather than my Macbook, so it seemed fair enough -

cp /usr/local/Cellar/couchdb/1.1.0/Library/LaunchDaemons/org.apache.couchdb.plist ~/Library/LaunchAgents/

launchctl load -w ~/Library/LaunchAgents/org.apache.couchdb.plist

Check its running okay by going to
http://127.0.0.1:5984/

which should provide something akin to :
{“couchdb”:”Welcome”,”version”:”1.1.0″}

– INSTALL RABBITMQ –

brew install rabbitmq
/usr/local/sbin/rabbitmq-server -detached

sudo rabbitmqctl add_vhost /chef
sudo rabbitmqctl add_user chef testing
sudo rabbitmqctl set_permissions -p /chef chef “.*” “.*” “.*”

Ok, Gettin’ back to my mission, break out the whipped cream and the cherries, then I go through all the fly positions – oh, wrong mission!

Ok..

brew install gecode
brew install solr

sudo gem install chef-server chef-server-api chef-server chef-solr
sudo gem install chef-server-webui
sudo chef-solr-installer

Setup a conf file –
sudo mkdir /etc/chef
sudo vi /etc/chef/server.rb
– paste in the example from:

http://wiki.opscode.com/display/chef/Manual+Chef+Server+Configuration – making the appropriate changes for your FQDN

At this point, the above instructions ask you to start the indexer however the instructions haven’t been updated to reflect changes to Chef version 0.10.2 in which chef-solr-indexer has been replaced with chef-expander

So, instead of running:
sudo chef-solr-indexer

you instead need to run:
sudo chef-expander -n1 -d

Next i tried
sudo chef-solr

which ran into
“`configure_chef’: uninitialized constant Chef::Application::SocketError (NameError)”

i had to create an /etc/chef/solr.rb file and simply add this to the file:

require ‘socket’

startup now worked –
if you want to daemonize it, use:

sudo chef-solr -d

Next start Chef Server with:
sudo chef-server -N -e production -d

and finally:
sudo chef-server-webui -p 4040 -e production

Now you should be up and running – you need to configure the command client ‘Knife’ follwing the instructions here – under the section ‘Configure the Command Line Client

mkdir -p ~/.chef
sudo cp /etc/chef/validation.pem /etc/chef/webui.pem ~/.chef
sudo chown -R $USER ~/.chef

knife configure -i

(follow the instructions at the link – you only need to change the location of the two pem files you copied above)

Ok, so hopefully you’re at the same place as me with this all working at least as far as being able to log into CouchDB, and verifying that Chef/Knife are both working.

– VAGRANT SETUP –

Now, onward with the original task of Vagrant setup…
Have a read over the getting started guide:

Install VirtualBox – download from http://www.virtualbox.org/wiki/Downloads

Run the installer, which should all work quite easily. Next..

gem install vagrant

mkdir vagrant_guide
cd vagrant_guide/
vagrant init

this creates the base Vagrantfile, which the documentation compares to a Makefile, basically a reference file for the project to work with.

Setup our first VM –
vagrant box add lucid32 http://files.vagrantup.com/lucid32.box

This is downloaded and saved in ~/.vagrant.d/boxes/

edit the Vagrantfile which was created and change the “box” entry to be “lucid32″, the name of the file we just saved.

Bring it online with:
vagrant up

then ssh into with
vargrant ssh

Ace, that worked quite easily. After a little digging around, I logged out and tore the machine down again with
vagrant destroy

– TYING IT ALL TOGETHER –
Now we need to connect our Vagrant install with our Chef server

First, clone the Chef repository with:
git clone git://github.com/opscode/chef-repo.git

add this dir to your ~/.chef/knife.rb file
i.e
cookbook_path ["/Users/thorstensideboard/chef-repo/cookbooks"]

Download the Vagrant cookbook they use in their examples -

wget http://files.vagrantup.com/getting_started/cookbooks.tar.gz
tar xzvf cookbooks.tar.gz
mv cookbooks/* chef-repo/cookbooks/

Add it to our Chef server using Knife:
knife cookbook upload -a
(knife uses the cookbook_path we setup above)

If you browse to your localhost at
http://sbd-ioda.local:4040/cookbooks/
you should see the three new cookbooks which have been added.

Now to edit Vagrantfile and add your Chef details:

Vagrant::Config.run do |config|

config.vm.box = "lucid32"

config.vm.provision :chef_client do |chef|

chef.chef_server_url = "http://SBD-IODA.local:4000"
chef.validation_key_path = "/Users/thorsten/.chef/validation.pem"
chef.add_recipe("vagrant_main")
chef.add_recipe("apt")
chef.add_recipe("apache2")

end
end

I tried to load this up with
vagrant up
however received:

“[default] [Fri, 05 Aug 2011 09:27:07 -0700] INFO: *** Chef 0.10.2 ***
: stdout
[default] [Fri, 05 Aug 2011 09:27:07 -0700] INFO: Client key /etc/chef/client.pem is not present – registering
: stdout
[default] [Fri, 05 Aug 2011 09:27:28 -0700] FATAL: Stacktrace dumped to /srv/chef/file_store/chef-stacktrace.out
: stdout
[default] [Fri, 05 Aug 2011 09:27:28 -0700] FATAL: SocketError: Error connecting to http://SBD-IODA.local:4000/clients – getaddrinfo: Name or service not known”

I figured this was a networking issue, and yeah, within the VM it has no idea of my Macbook’s local hostname, which i fixed by editing its /etc/hosts file and manually adding it.

Upon issuing a
vagrant reload, boom! you can see the Vagrant host following the recipes and loading up a bunch of things including apache2

However at this point, you can still only access it’s webserver from within the VM, so in order to access it from our own desktop browser, we can add the following line to the Vagrantfile:
config.vm.forward_port(“web”, 80, 8080)

After another reload, you should now be able to connect to localhost:8080 and access your new VM’s apache host.

In order to use this setup in any sort of dev environment will still need a good deal more work, but for the moment, this should be enough to get you up and running and able to explore both Vagrant and Chef.


Get Off Of My Cloud!

I work around Hadoop a lot at work, however my duties don’t really require me to interact much beyond the occasional data-mining work in Hive. Recently, I was intrigued to read that Amazon’s EC2 services are built upon Xen Hypervisor, which is also something I use, but don’t administer. I figured I would combine them into learning exercise by setting up a Xen server, create a little virtual cluster, configure them all via Puppet (another project i’ve been wanting a look at) and finally setup Hadoop on them all. Here’s what happened..

XEN INSTALL——————–

So  - Xen is virtualisation software – the master host where the Hypervisor software is installed is called Dom0, and it can have many DomU guests, each a virtual machine which can run a host Operating System. In order to install the Xen Hypervisor and run Dom0, the kernel needs some modifications, and my first surprise was to discover recent releases of Ubuntu’s kernel don’t have Dom0 support. (The most recent discussion I could find on this was here, from 2010, so things may have changed)

My Ubuntu install has been through several years of upgrading and was getting a little crufty, plus the recent move to Unity desktop with Natty 11.04 wasn’t exciting me very much. I’ve also been keen to give Debian another go, as it’s been a few years, so rather than backport a Debian kernel to Ubuntu, I just went for a fresh install of Debian 6.0/Squeeze.

Aside from kernel support, more recent hardware support seems, if not essential, to make installation a lot easier. You need some BIOS support and CPU support – I’m running on a DELL Precision 390 – BIOS v.2.1.2 and CPU is a Pentium Core-2 Duo. I switched on Virtualization under the Performance section of my BIOS.

Aiight, so following instructions at http://wiki.debian.org/Xen i installed kernel and meta package..

apt-get install xen-linux-system

Then to ensure the xen kernel is loaded first, you have to move some files around:
cd /etc/grub.d/
mv 10_linux 50_linux
update-grub

Reboot machine and you should be running with a Dom0 kernel:

root@bitbot:/etc/puppet# uname -a
Linux bitbot 2.6.32-5-xen-amd64 #1 SMP Thu May 5 00:57:12 UTC 2011 x86_64 GNU/Linux

Now install Xen tools..

apt-get install xen-tools

Following some older HOWTO’s i started building my own filesystems for my first Guest machine:

dd if=/dev/zero of=/Xen/host/weebit1/root.img bs=1024k count=1024mkfs -t ext3 /Xen/hosts/weebit1/root.img
dd if=/dev/zero of=/Xen/host/weebit1/swap bs=1024k count=1024

However –  DON’T! there’s no need - Have a read through of the instructions here -

xen-create-image –manual

(basically it does everything for you – i.e. creating partitions & swap, formatting, filesystems, xen config file, installing OS)

Then all you need to type is:
xen-create-image –hostname weebit –vcpus 2 –scsi –dist squeeze–pygrub –dir=/Xen/host

(i had tried installing natty first, but was having trouble booting the VM - ran into issues with “ALERT!  /dev/sda2 does not exist.  Dropping to a shell!“)

With Debian systems, the Xen install doesn’t enable the Ethernet bridging by default so you need to edit /etc/xen/xend-config.sxp and uncomment:

(network-script network-bridge)

(Or when you try to start up your VM later you will run into:
xm create /etc/xen/weebit.cfg -cError: Device 0 (vif) could not be connected. Could not find bridge, and none was specified“)

Restart Xend
/etc/init.d/xend restart

Once you have your Xen hosts created succesfully, you can start them with a command like:
xm create /etc/xen/weebit.cfg -c (where the cfg file matches the name of the host you created)

I created two VM’s – weebit and weebot – (and my master host is called bitbot) – yes, it gets confusing!

PUPPET SETUP..——————

Okay, two VMs setup with networking, so next thing was setting up Puppet on them – this part was by far the most troublesome – although there is nothing particularly complex about Puppet, it’s error messages aren’t the most verbose or helpful, and I found it quite sensitive to permissions and other peculiarities.

Here’s the main docs:
http://docs.puppetlabs.com/guides/configuring.html

This part is not essential (you can override it in configuration), however a default Puppet install expects the server to be called puppet so it’s best to setup some name resolution -I set it up simply via /etc/hosts.

PUPPET SERVER setup:

apt-get install puppet puppetmaster
update-rc.d puppetmaster defaults

Just to test it locally, run a puppet agent on the server:

puppet agent –test

All looks good, apart from it doesn’t do anything yet. My first test and following with many of the HOWTOs out there was to fix sudoers perms

vi manifests/site.pp

# fixup permissions on sudo
class sudo {
file { “/etc/sudoers”:
owner => root,
group => root,
mode => 440,
}
}

node default {
include sudo
}

In order to give it something to fix, i changed the ownership to my own user and altered the perm:
chown thorsten /etc/sudoers
chmod 400 /etc/sudoers

Ran this again:
puppet agent –test

Nothing! huh. however, running:
puppet apply /etc/puppet/manifests/site.pp

That worked fine. So first lesson – after any config changes – you need to restart Puppet -
/etc/init.d/puppermaster restart

puppet agent –test
notice: /Stage[main]/Sudo/File[/etc/sudoers]/owner: owner changed ‘thorsten’ to ‘root’

Boom! all good.

Now, rather than have a monolithic site manifest, Puppet best practises advocates a modular approach.

“Each module has a specific directory structure that allows Puppet to find all elements of the module and auto-load them.”

mkdir -p /etc/puppet/modules/sudo/manifests

move the ‘sudo’ class logic from site.pp to modules/sudo/manifests/init.pp:

vi modules/sudo/manifests/init.pp

# /etc/puppet/modules/sudo/manifests/init.pp
class sudo {
package { sudo: ensure => latest }
file { “/etc/sudoers”:
owner   => root,
group   => root,
mode    => 440,
source  => “puppet:///sudo/sudoers”,
require => Package["sudo"],
}
}

mkdir /etc/puppet/modules/sudo/files
cp /etc/sudoers /etc/puppet/modules/sudo/files

Modules in the modulepath should be auto loaded, however we can explicitly import them too:

# /etc/puppet/manifests/modules.pp
import “sudo”

Move the Node list into manifests/nodes.pp

node default {
include sudo
}

Finally, update site.pp to remove previous statements, and have it import the new modules and nodes .pp files:

# /etc/puppet/manifests/site.pp
import “modules”
import “nodes”

Again, restart Puppetmaster, then test the agent:

puppet agent –test
err: /Stage[main]/Sudo/File[/etc/sudoers]: Could not evaluate: Error 400 on SERVER: Permission denied – /etc/puppet/modules/sudo/files/sudoers Could not retrieve file metadata for puppet:///sudo/sudoers: Error 400 on SERVER: Permission denied – /etc/puppet/modules/sudo/files/sudoers at /etc/puppet/modules/sudo/manifests/init.pp:13

Second lesson! Make sure the modules dir is owned by your Puppet user account:

chown -R puppet modules

puppet agent –test

Cool, all good again!

PUPPET CLIENT INSTALL———-

apt-get install puppet
In order to have it start at boot:

vi /etc/default/puppet

Test it..
puppet agent –server puppet –waitforcert 60 –test

You should see:
info: Requesting certificate
warning: peer certificate won’t be verified in this SSL session
notice: Did not receive certificate

This is the inbuilt security, whereby the server needs to approve and sign the client certificates before they can be administered.

Back on the server type:
puppet cert –list

You should see your client machine listed. To approve it set:
puppet cert –sign weebot

BANG!

Now on your client, you can run:
puppet agent –test
(the –test option adds verbose and no-daemonize to the runtime so you can see whats going on)

You should hopefully see it now install the sudo package on your clients.

I wanted to test adding another package beyond the SUDO setup, so went ahead with NTP

cd /etc/puppet/modules/

mkdir -p ntp/manifests <– WATCH OUT, I ORIGINALLY NAMED THIS ‘MANIFEST’ – SINGULAR – AND IT TOOK ME AGES TO FIND OUT WHY IT WASN’T WORKING

vi ntp/manifests/init.pp
# /etc/puppet/modules/ntp/manifests/init.pp
class ntp {
package { “ntp”:
ensure => installed
}

service { “ntp”:
ensure => running,
}
}

Added ntp to manifests/modules with:
import “ntp”

updated manifests/nodes.pp with:
include ntp

And again -
chown -R puppet modules
/etc/init.d/puppermaster restart

On the clients:

puppet agent –test

So just to summarize – watch your permissions, watch yer typos and remember to restart!

For troubleshooting, puppetmaster logs to /var/log.syslog by default

I searched around for a while to find out whether I would need to set up cron jobs to automate the client, but no, if you set /etc/default/puppet to start at boot, as mentioned above, it will launch and daemonize itself. Default is to run every 30mins. To find out all the default options, run puppet agent –genconfig

HADOOP SETUP————————-

Okay, we have one master node, and two VM’s running, all configured with Puppet. Now to setup Hadoop on them.

There are no Hadoop packages in the standard Debian repositories, so you have to add Cloudera’s manually. They don’t have specific packages for Squeeze yet, but the Lenny ones work fine.

Add the following to your /etc/apt/sources.list on the Master:
#HADOOP
deb http://archive.cloudera.com/debian lenny-cdh3b3 contrib
deb-src http://archive.cloudera.com/debian lenny-cdh3b3 contrib

Then add their public key to your apt keyring:
curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -

apt-get update
and
apt-get install hadoop-0.20
to ensure that all works fine on the server.

All good?
aiiiight..

Okay, so we’ve been using Puppet to configure the VMs, so let’s also add apt.sources to our Puppet configuration:

cd /etc/puppet
mkdir -p modules/apt/manifests
mkdir -p modules/apt/files
cp /etc/apt/sources.list modules/apt/files/

Create:
# /etc/puppet/modules/apt/manifests/init.pp

class apt {

file { “/etc/apt/sources.list”:
owner => root,
group => root,
mode => 644,
source => “puppet:///apt/sources.list”,
}

exec { subscribe-echo:
command => “/usr/bin/apt-get -q -q update”,
logoutput => false,
refreshonly => true,
subscribe => file["/etc/apt/sources.list"]
}
}

chown -R puppet modules/

Add the new entries to:
vi manifests/modules.pp
import “apt”
vi manifests/nodes.pp
include apt
/etc/init.d/puppetmaster restart

At this point, I manually added the cloudera public key to both VMs (same command as above) – probably a smarter way of doing this, but this sufficed for my own efforts:
curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -

Again, run the:
puppet client –test
and if the planets are in alignment, your updated sources.list file should be copied over to the clients, and an apt-get update will be run.

Ok, so hopefully just one last step for Hadoop installation. Lets now add a Hadoop class to Puppet:

mkdir -p /etc/puppet/modules/hadoop/manifests
vi /etc/puppet/modules/hadoop/manifests/init.pp

# /etc/puppet/modules/hadoop/manifests/init.pp

class hadoop {

package { “hadoop-0.20″:
ensure => installed
}
}

The now familiar dance..
vi manifests/modules.pp
add import “hadoop”

vi manifests/nodes.pp
add include hadoop

chown -R puppet modules/
/etc/init.d/puppetmaster restart

Moment of truth.. Go to the clients and run our:
puppet agent –test

The following packages have unmet dependencies:
hadoop-0.20 : Depends: sun-java6-jre but it is not going to be installed
Depends: sun-java6-bin but it is not going to be installed

Bugger! You have to accept a user license to use Sun’s (well Oracle now) version of Java, so I had to run ‘apt-get -f install‘ by hand so i could accept Yes on the license. I’m not sure if there is an automated way of doing this or if it is necessary. Again, for my purposes it was fine to just do by hand. If you have a much bigger installation you might want to work out a workaround for this. ( I had already installed Java on my master/physical machine)

So – an almost automated install of Hadoop on a three node cluster! I haven’t configured my cluster with anything useful yet, so I’ll leave that for another article..

thor

#############

The following resources all proved useful :
http://docs.puppetlabs.com/references/latest/configuration.html
http://docs.puppetlabs.com/guides/tools.html
http://docs.puppetlabs.com/man/agent.html
http://bitfieldconsulting.com/puppet-tutorial
http://www.debian-administration.org/articles/526
http://projects.puppetlabs.com/projects/puppet/wiki/Simplest_Puppet_Install_Pattern
http://bitcube.co.uk/content/puppet-errors-explained
http://groups.google.com/group/puppet-users/browse_thread/thread/66361418d801a97c