Found this to be a particularly good episode of The Food Fight show with Jeremy Edberg and Adrian Cockcroft talking about the Netflix tools and architecture:
I was along at this talk last year, and just now found it online – was one of the most informative talks i’ve been to, learned loads from it –
Excellent video specifically for PostgreSQL on AWS, however the principles are pretty universal information for running anything on AWS –
I discovered Splunk about 6 months ago, and aside from the unfortunate name and the truly evil pricing model, I was quite taken with the app itself, a searchable realtime interface to a centralized logging server.
I read about Logstash and Graylog2 a few months back, which seemed to offer a similar functionality, but just had never had the time or opportunity to implement it until now. There are many moving pieces so it can be a bit confusing at first, but the confusion simply stems from the flexibility of how you configure the components. I found the explanation on this blog piece to be the simplest way to explain how the components can be connected.
Graylog2 by itself can take the place of your central Syslog server. It utilizes quite a few other pieces of software to run – Elasticsearch, MongoDB and some ruby gems for the web interface, so even in this most simplified setup, there are still quite a few steps to getting it up and running. There are several tutorials online and linked from the Graylog2 website – the one I followed, and would highly recommend was this blog post – took me a few hours from initial futzing around to having a working setup.
If you already have a central Syslog server up and running, you might not be so keen to change all your hosts (why not? you’re running Puppet surely! :) – in which case, you can configure Rsyslog or Syslog-ng to forward to your Graylog2, or alternatively, use Logstash to parse, optionally transform and forward your syslogs. This post has some decent info on setting up like this.
But of course, the most important thing is that they have a “motherfucking party gorilla with a motherfucking party hat” for their logo, and how can you argue with that??!
( Here’s why! )
the legendary Mike Hoover turned me on to Backblaze a few months back – they’re a cloud storage provider, who opened up the design for their chassis and storage pod solution so you can build your own “Storage Pod 2.0: a 135-terabyte, 4U server for $7,384″ (blog post here)
Unfortunately I forget where I found this link – Hacker News? The Edge Newsletter? I dunno, but it’s a pretty interesting one –
A debate between an MIT professor, Erik Brynjolfsson, and an Economist, Tyler Cowen, about the the role of technology in driving economic growth. My views side with the MIT professor, as does most of the audience in the debate.
I won’t repeat any of the arguments made in the debate, but what I will add is that the unequal distribution of wealth we see around today is not a symptom of lack of technological growth, it is purely down to good old fashioned political manipulation and deep rooted traditions of cronyism, a tradition thousands of years old.
Technology on the other hand: absolutely it’s what will drive the economy, but even that view completely misses the big picture, which is the Medium itself, The Universal Network. I believe we have created a whole new dimension, an evolutionary mathematical abstracted form of biology. This is the beginning of History, Year Zero.
One hundred years from now, or two thousand – people will be able to look back in time and know with a rich level of detail what our life is like now. Thousands, upon millions of instances of video and audio, images, writings, geo locations, online trails, all readily accessible, interlinked and searchable. This level of detail will only increase, as we start recording every aspect of life.
With such archives of data, I can easily imagine the kids of 2123 being able to walk through and interact with a virtual London in the swinging 2020′s, or San Francisco’s roaring 2030′s. Whereas, for future generations, any time predating the late 1990′s will essentially be a static foreign place in comparison. We have created time-travel – we just don’t know it yet.
This Network has already achieved a basic level of independence from humanity – where now it is possible for a Something to exist outwith a single containing computer system using techniques like redundancy and geographic load-balancing. I don’t mean to imply there is any intelligence there, but there is a level of resilience we’ve never seen in nature before. To give a more concrete example, I’m referring to something like you as a user interacting with the amazon website to purchase something, meanwhile the power goes out in the datacentre hosting the server your browser was communicating with, and, if engineered correctly, your interaction could continue, picked up by a secondary datacentre with no loss of data, nor interruption of service. This isn’t exactly life as we know it, but if you squint your eyes just a little, its not too hard to see an analogy to biological cell life.
Over the next few years, Society’s experience of reality is going to go through the biggest change in history, as our physical world merges completely with this new virtual world of realtime interconnected information and communication, completely warping our sense of time and geography.
The iPhone was stage one, Google Glasses or something very similar will be stage two, and its right around the corner.
I’ve been trying to track down problems with really slow network transfer speeds between my servers and several DSPs. I knew it wasn’t local I/O, as we could hit around 60Mb/s to some services, whereas the problematic ones were a sluggish 0.30Mb/s; I knew we weren’t hitting our bandwidth limit, as cacti showed us daily peaks of only around 500Mb/s of our 600Mb/s line.
I was working with the network engineer on the other side, running tcpdump captures while uploading a file and analysing that in Wireshark’s IO Graphs – stream looked absolutely fine, no lost packets, big non-changing tcp receive windows. We were pretty much stumped, and the other engineer recommend i look into HPN-SSH, which does indeed sound very good, but first i started playing around with trying different ciphers and compression.
Our uploads are all run via a perl framework, which utilises Net::SFTP in order to do the transfers. My test program was also written in perl and using the same library. In order to try different cyphers i started testing uploads with the interactive command line SFTP. Boom! 6Mb/s upload speed. Biiiig difference from the Net::SFTP client. I started playing with blowfish cipher and trying to enable compression with Net::SFTP – it wasn’t really working, it can only do Zlib compression, which my SSHD server wouldn’t play with until i specifically enabled compression in the sshd_config file.
After much more digging around, i came across reference to Net::SFTP::Foreign, which uses the installed ssh binary on your system for transport rather than relying on the pure perl Net::SSH.
Syntax is very similar, so it was a minor rewrite to switch modules, yet such a massive payback, from 0.30Mb/s up to 6Mb/s.
(It turns out the DSPs i mentioned earlier who could achieve 60Mb/s were actually FTP transfers, not SFTP)
I’ve been working pretty extensively with Xen and Puppet in my new job, really loving it! I’ve been creating a whole load of Xen hosts, most of which are cloned from an initial image I built using Xen-tools. I’ve just finished a script which is over on my github page, which basically automates what was previously a manual process.
Basically, it copies your existing disk.img and swap.img, generates a new xen.cfg file based on some interactive input (desired hostname, IP, memory and number of vCPUs) plus a random Xen mac address, then mounts the disk.img file and changes some appropriate system files – /etc/hostname, hosts, and network/interfaces.
All quite simple and straight forward, but quite nice to have automated.
Here’s the README:
A script for automating Xen VM deployment.
It requires that you have a base disk.img and swap.img already created.
I created mine with:
xen-create-image –pygrub –size=50Gb –swap=9Gb –vcpus=2 –memory 6Gb –dist=squeeze –dhcp –passwd –dir=/var/virt-machines –hostname=xen-squeeze-base
Fill in some of the variables at the top of GenXen.pl before running, then simply:
The interactive part will ask for hostname, memory size, vCPUs, IP address, then generate a unique Xen mac address, and write these all to a xen config file which will be saved in /etc/xen/
It’ll copy your disk.img and swap.img to destination dir, mount the disk.img and create appropriate files for:
After that you should be good to launch with:
xm create -c /etc/xen/whatever-your-hostname-is.cfg