The Senior Engineer’s Guide to Helping Others Make Decisions

One of the areas I often see senior engineers struggle with, is raising junior engineers to the next level.
Often this is because we don’t give them the space to explore, learn and understand how to approach problems for themselves.

We’re going to look at three different scenarios to illustrate how decisions are usually made, how they could be better, and how senior engineers should be helping others make decisions, rather than making decisions themselves.

How we make decisions

The biased process

Each of us carries a set of biases on making decisions. These are shaped by years of real-world experience, often experiences that lead to us getting burned. This is an incredibly powerful effect and it draws us towards certain lines of thinking – and more importantly it draws us away from other lines, even if they could lead to more positive outcomes.

When presented with a suggested course of action by another person, we start with the horrible end result that could happen and then construct a path which could lead us there, starting from the new suggestion. From here we start to look for ways to avoid the new proposal completely, using the strength of the scary arguments that would lead to disaster. The number of behavioural biases which come into play here is pretty astounding. Choice-supportive biasBelief bias, Confirmation bias, Negativity bias, take your pick. The list is long.

Imagine this dialogue taking place:

Junior engineer: Instead of backing up the database once a week,
                 what if we did it every day and made multiple
                 copies of the backup?
Senior engineer: Every day? No that's terrible, what if the
                 backups run long and go into peak traffic?
J: Well, I don't know.. they shouldn't?
S: But they might. And how much more space will daily backups
   take? We can't afford that.
J: No, I suppose you're right.
S: The last time we tried this we took the site down for hours.
   We got in so much trouble.
J: Oh wow. I'm sorry, I didn't realise my idea was so bad.
S: You'll figure it out one day.

A perfectly reasonable, and some might even say prudent, idea was presented but the biases of the senior engineer kicked in hard.
They remembered how much a similar idea had cost them in the past and immediately wrote it off.
Not only did they shoot down the original idea, they didn’t know why the junior engineer was suggesting daily backups take place. They used strong arguments to justify their position for which the junior engineer had no context and couldn’t learn from.’

Worst of all, the junior engineer was left feeling reprimanded for their lack of experience, without any opportunity to learn from the situation.

How we should make decisions

We often talk about “learning by doing”, the idea that we learn best when we try to do some work. Often times, failure from trying is an excellent way to learn new things if we can overcome the bias which negative experiences force on us.

The journey

My own journey of learning how to be a good engineer started in the mid 90’s when I got my first PC. It ran Windows 3.0, had 4Mb RAM and a 512Mb hard drive. After the “cool factor” wore off and I realised it was a pretty dumb, expensive box, I started wanting to do more with it but I couldn’t figure out how or what.

I started by reading computing magazines, mostly for three types of articles:

  • Performance tuning tips
  • Help and advice columns
  • Articles on new technologies

The advice columns were great as a way to understand the problems others were having. I would try and see if I was having those problems, and to replicate the solutions.
Quite often I would end up breaking the system in unexpected ways.

First lesson learned: Have install disks handy.

In this way I built up a body of knowledge on recoverability, safe testing, getting comfortable with worst case scenarios, and so much more.
Many operations/production engineers had similar learning experiences, yet we too often shy away from this in our professional work when we should be encouraging junior engineers to work like this more.

 

Half way there

The above conversation is quite common not just in our industry but in many situations in both professional and personal relationships.

After this, we have what I call the “half-way approach”.
Let’s take a look at how that conversation went:

Junior engineer: Instead of backing up the database once a week,
                 what if we did it every day and made multiple
                 copies of the backup?
Senior engineer: Every day? We tried to do this once before, but
                 it didn't go well.
J: What went wrong?
S: The backups ended up running long, and when we got to peak
   traffic in the day, the site went down for hours.
J: Oh, that sucks. Is there anything we could do about it?
S: I've had some time to think about this since the last time.
   If you start the backups earlier and add a cron job to kill
   them if they take too long, that would save us some trouble.
J: Ok, I can do that.
S: And have that job send us an email so we know it happened.
   But before you do any of this, open a ticket to see if we can
   buy more storage space. If we can't do that, this whole thing
   is moot.
J: Great, I'll get on it. Thanks!

This is certainly an improvement over the previous scenario. There is less hostility, and more progress on getting the work done.
On the other hand we still have two major issues:

  • The engineers seem to be working together towards a solution, but the senior engineer is planning the work based on their prior experience, and the junior engineer is simply following already carved-out steps.
    The junior engineer is a passive participant, taking notes and then going off to implement a given solution.
    There is still very little opportunity for the junior engineer to learn and grow.
  • The senior engineer still hasn’t asked the junior engineer why they want to do the work, and what problem they’re trying to solve.
    There’s an assumption being made that the two engineers are working on the problem for the same reasons, but we don’t know if that is truly the case.

 

A better approach

Breaking our biases is extremely difficult
Even more difficult is remembering that a lot of ego and personal attachment can play into the decisions we make, too.
When other engineers come to us with problems, ideas and approaches, we can feel a strong desire to direct them to solutions we would like to see implemented, based on our ideas and experiences.

Being a senior engineer means realising not everything should be done the way you want it to be.

The corollary to this is:

Others may make tools, systems or decisions which aren’t as good as the ones you would make, but they should be able make them anyway.

Engineers, whether junior of senior, should be provided guidance and suggestions to help them form stronger ideas and improve their ability to reason, rather than being given solutions to implement or being out-right shot down.

Let’s take the conversation between our two engineers, and see what it would look like with this approach:

Junior engineer: Instead of backing up the database once a week,
                 what if we did it every day and made multiple
                 copies of the backup?
Senior engineer: Interesting idea. What's the problem you're
                 trying to solve?
J: I'm worried that we could lose a week of data if the servers
   crash.
S: That's a good point. You might run into a problem though:
   What if the backups end up running at peak time? We do them
   on the weekend because it's quiet.
J: I see. We can try to start them earlier, and maybe add
   monitoring to alert us if it happens?
S: That's a good start. Set it up and let's see what it looks like.
[ some time later ]
J: I was thinking, wouldn't it be great if we could just back up
   what changed every day, rather than everything? It seems like
   a waste of space.
S: That would be great! I have some ideas on how you could do that,
   but why don't you go and try a few things yourself first?
   If you get stuck, let me know.
J: Thanks!
[ some more time later ]
J: Ok, I think I have a possible solution. It's pretty great!
   I learned about this language called Perl and it made it really
   easy to do what we need.
S: Perl. Huh. Well.. there are some issues you might run into later
   with maintainability because only you and I know Perl. But if
   you like the solution, and it works, let's do it.
   Make sure you write really good documentation on how this works,
   and how to fix it if it breaks.
   And don't forget to comment your code!
S: Oh, and do you have a ticket to track all this work? That's also
   a good way to document what you did and why, so we can look back
   on it later if we need context.
J: Not yet, but I'll make one before I do anything else. Thanks!

It’s important to remember that after exploring the space, the engineers together might realise the solution is unworkable, or inefficient, or not feasible for business reasons. It’s even more likely that the junior engineer would come up with a solution which works fine, but isn’t what the senior engineer would have done themselves.

Fear of these is not justification for blocking the work.

The junior engineer picked up new hard skills:

  • Learning new language
  • Implementing a backup solution

As well as a bunch of critical soft skills:

  • Improved decision making
  • Engaging other engineers
  • Presenting and refining ideas
  • Presenting solutions
  • Tracking work

And that’s all before the solution went to production!

Helping junior engineers raise strong proposals

Presenting ideas is a learned skill, we develop it over time from interactions with others, discussions and debates, watching talks at conferences and attending meet-ups.

As senior engineers, it’s important for us to help engineers around us develop this skill.

In the third conversation, the senior engineer asks the junior engineer what the problem is they’re trying to solve – this approach is the Socratic method.
Other good questions to ask are:

  • What are the top three solutions you investigated before you settled on this one?
  • Why did you reject the others?
  • Do any of them have merits which we should investigate later?
  • What could go wrong with your solution?
  • How maintainable is it if you’re not available?
  • What data do you have to back up your assumptions?

The goal is to get your engineers thinking about these questions before they come to you with proposals.
Have them present the answers to these along with their initial idea.
Over the long term this will help them develop strong data-driven solutions, evaluate multiple solutions and make their ideas more approachable.

Almost all work is a learning opportunity, but you shouldn’t encourage junior engineers to pick low-risk or low-impact work purely for the learning opportunity. Find time which can be dedicated to pure learning, and allow them to work on difficult, and even risky tasks with guidance.

The impact of language

To wrap up this post I’d like to take a moment on how words we use can have a huge impact to the tone and impact of a conversation.

In the third conversation, the senior engineer didn’t use the words “no” or “but”. Instead they promoted the idea as being plausible and encouraged the junior engineer to explore the space. The tone of the two conversations was very different. The first bordered on hostile and defensive, while the last was open, encouraging and welcoming.

Even though the solution may not have been designed and implemented the way the senior engineer would have done it, it still worked and provided the junior engineer a feeling of ownership. It gave them the ability to learn a lot more about the space, which they wouldn’t have done if their initial idea had been shut down, or if they were told what to implement and how to do it.

Instead of “no”, “but” or “yes, but…” (a subtle negative), learn to say “yes”, or “yes, and…”.
Encourage other engineers to go forth and come up with solutions to problems, and then help them implement those solutions.
They will not be the same as the ones you would come up with.
They may not even solve all the problems you see now, or see coming down the road.
But they are a start, and an important opportunity for others to make progress, grow and be successful.

Thanks

Thank you to Jon Cowie and John Looney for reviewing this post and helping me refine a lot of the language and focus.

Temporary fix for Sony A7ii stuck shutter issue

Last week I purchased a Sony A7ii, which I absolutely love.
The camera is fantastic for some many reasons that are covered by many reviews. Unfortunately I was hit by small known issue with some of the camera builds: sometimes, after you take a picture, the shutter will get stuck, and stay closed.

The symptoms of this are easy to see. After you take a shot, the LCD display turns black. If you try to take a picture, the camera reports “camera error turn power off then on.”
Of course, turning the power off and on doesn’t actually help at all. Neither does cleaning the sensor.
You can usually nudge the edge of the shutter upwards very gently to get it to release.

After switching lenses several times with no improvement, I was demonstrating the issue to my wife without a lens attached.
Immediately the shutter stopped getting stuck!
I reattached the kit lens, and another lens, and still the camera has been fine.

So if your shutter is getting stuck, try taking a few shots with the lens off in a clean environment – you don’t want to get the sensor dirty!

The importance of benchmarking

As an operations engineer, my go-to philosophy is often “make it work first, then make it work better.” When you write a script, or apply a complex change, you don’t always have the luxury of making is perfect in the first iteration.

As always, there are exceptions to this rule. In today’s case, I was making a change to an application which runs the same method billions of times a date.
In situations like this, small changes can have very large consequences.

The change I’m making requires checking whether a timestamp is within a specific range, in this case +/- one day. Whichever language you’re working in, there are no doubt multiple ways to achieve this. I’m using Ruby, and stackoverflow offered me a few quick and reasonable solutions:
http://stackoverflow.com/questions/4521921/how-to-know-if-todays-date-is-in-a-date-range

In an application where this would be run a few billion times a day, microsecond changes in performance can add up to a lot of lost CPU time.

Benchmarking to the rescue!

Ruby has a lovely and easy way to benchmark these. (Python does too, find them for your favourite language!)
This is the script I used to compare 100,000 runs of each approach to the problem:

#!/usr/bin/ruby
## Benchmark a couple of different ways to see if a given time is in a time
## range of +- one day.
require "benchmark"
require "date"
time = Benchmark.measure do
 (1..100000).each { |i|
 now = Date.today
 ((now-1)..(now+1)) === now
 }
end
puts "The first test took: #{time}"
time = Benchmark.measure do
 (1..100000).each { |i|
 (Time.now-1..Time.now+1).cover?(Time.now)
 }
end
puts "The second test took: #{time}"

Running the script produces the following result:

The first test took: 0.580000 0.130000 0.710000 ( 0.712725)
The second test took: 0.250000 0.000000 0.250000 ( 0.247216)

Great! A quick win. After another look, we can even spot that we’re calling Time.now three time. It probably makes sense to only call it once:

now = Time.now
(now-1..now+1).cover?(now)

If we run the benchmark again, we now see:

The first test took: 0.580000 0.120000 0.700000 ( 0.706634)
The second test took: 0.160000 0.000000 0.160000 ( 0.163877)

An even more significant improvement.
Writing and running this benchmark, with tweaks and measuring averages, took me about 10 minutes.
It saved me significantly more time, and money.
This code will be run 40,000 times more in one day, than I ran in my benchmark.
The difference between 6,400 seconds (the second test) of CPU time and 23,200 seconds (the first test) is pretty big.

How big is the difference?

Let’s do the math!

23,200 - 6,400 = 16,800 seconds of wasted time.
16,800 / 60 seconds / 60 minutes = 4.67 hours of CPU time per day.

I’d say that’s pretty big.

The case for distributed teams

Recently, Paul Graham wrote an essay, making the case to change immigration rules to allow more excellent programmers into the US.

While Paul makes a number of excellent points, I think he’s missed the mark. There is a certain mindset in large parts of our industry that in order to succeed as a programmer, or to succeed as a start up, you must be located in Silicon Valley, and you must always work locally with your team. It’s probably not surprise that most of the people who have this view point are also in Silicon Valley.

Since his essay was published twitter has been ablaze with discussion on this matter, with a number counter points being raised:

  • The desire to increase immigration stems from the desire to lower wages.
    While Paul didn’t state this in his essay, this feeling is woven into the fabric of the Silicon Valley mindset. People are encouraged, if not out-right expected, to work long hours without extra pay. Some might argue that this is the price of success for a start up, but I don’t believe it is.
    The recent class action lawsuit against Adobe, Apple, Google, Intel and others highlights the desire by larger employers to keep wages suppressed too. According to data from salary.com, the more senior software engineers can easily cost companies $200,000 or more a year. And given the high demand for filling these positions, you can only expect that to go up.
  • The social and economic impact of cramming ever growing numbers of highly paid tech employees into the San Francisco bay area has also led to a great inflation in housing costs, contributed to a housing shortage and other issues.
  • There is a viable alternative to bringing more people into the US (and specifically into Silicon Valley), which is distributed teams and remote engineering. Paul has held that it is better to have people sitting together in the same office, that you lose something by not being able to have chance meetings, and that start up founders want people in the same place. Many of us believe that this isn’t necessarily true and that like all good things in life, balance is important. Paul later suggested that the success of distributed teams depends on the product being developed, implying that some (or many?) products cannot be created easily by distributed teams, which I also believe is not accurate.

I agree with many of the points Paul raised (most senior programmers ARE outside the US, and it does often help to have people in the same room to hash things out). However I don’t believe that fixing immigration laws should be the first, or only thing we talk about.

I’ve been involved in the New York start up scene since I joined Etsy in 2010. Since that time, I’ve seen more and more companies there embrace having distributed teams. Two companies I know which have risen to the top while doing this have been Etsy and DigitalOcean. Both have exceptional engineering teams working on high profile products used by many, many people around the world. There are certainly others outside New York, including Automattic, GitHub, Chef Inc, Puppet.. the list goes on.

So how did this happen? And why do people continue to insist that distributed teams lower performance, and are a bad idea?

Partly because we’ve done a poor job of showing our industry how to be successful at it, and partly because it’s hard. Having successful distributed teams requires special skills from management, which arent’t easily learned until you have to manage a distributed team. Catch 22.

My hope if that Paul and others will read this post, and see that managing remote engineers isn’t a losing proposition, and that it can be (and is being) done with great success.

Here are some key factors which I feel enable distributed teams for success.

Culture and management

The primary reasons for the success or failure of distributed teams come down to the culture of the organisation, and the strengths of management to enable and empower engineers to succeed regardless of location. Michael Rembetsy, Etsy’s VP of Operations, has given a number of culture based talks which touch on this. My favourite is one from Velocity EU 2012 on Continuously Deploying Culture. He mentions how Etsy used to have “too many” remote engineers, but “too many” was in the context of a poor culture. Before 2010, we trimmed down the number of remote engineers we had so that we could fix the core cultural issues, which then enabled us to hire many more remotes.

I’ll say this again because it’s very important:
The success or failure of a distributed team hinges on your organisational culture and the strengths of management, not on the product you’re creating or the nature of distributed teams themselves.

So what are some things organisations should work on, to promote success amongst distributes teams?

Become a learning organisation

Peter Senge promoted the idea of a learning organisation in The Fifth Discipline, as an organisation that is constantly learning and evolving. This has been a core part of Etsy’s success, and also integral to enabling our distributed engineers to succeed. The natural human reaction to negative events, which traditional corporate cultures foster, is it point blame and move on. With remote employees this can be even more of a problem. Feeling disconnected and alone while people “somewhere else” decide what happened can be unnerving and promote an unjust culture.

In 2010 Etsy had it’s first no-blame postmortem. John Allspaw wrote about this in 2012, detailing what this process involves and why it is important. Developing a Just Culture is crucial to supporting and empowering distributed teams. Individuals have to feel safe in their work, and supported by the organisation they’re in, if they’re going to do their best work. This is true with local employees, and especially true with remotes.

Over communication is the best communication

Early in my career I worked as a systems administration for a large ISP in the US on the night shift. We had little communication overlap with other teams, but we were primarily responsible for maintenances and downtimes. Some time into my employment, towards the end of one of my shifts, a member of another team came up to me and said “Boy, Av, you sure do reply to a lot of emails. Maybe you shouldn’t reply to everything you see?”

At the time, his advice felt strange and wrong, but I didn’t understand why. It wasn’t until my first position as a remote engineer in 2008 that I truly understood the importance of over communication. Since that time, the comments have turned around and people are glad when I and other remote engineers take the time to write down and express things in detail. Again, this is something that we can all do better regardless of our relative locations, but it’s especially true with remote engineers. Over communicating my actions, work load, thoughts and ideas with my peers has helped advance a sense of trust and team which is harder to do when you’re far away.

Communication has to work both ways. My manager and I have multiple scheduled 1:1 meetings each week. Sometimes they’re hard for us both to get to, but we try because they’re important. We take the idea of a “chance meeting”, and develop it during our chats.

Communicate as remote, by default

One of the final steps all engineers need to take to push success, is the idea of communicating as a remote engineer all the time. This is where Paul’s “chance meetings” really come in. At Etsy, our primary method of communication is IRC. Other organisations use Slack, HipChat, IRCCloud, Skype. The tool is less important than the idea, with the idea being to always use it for communications:

  • Is the person 3 desks away? Use IRC.
  • Is the person a minute walk away? Use IRC.
  • Is the person working at home? Great! You’re already used to using IRC.

In addition to this, we do our best to keep communications public, for anybody to see and join in. This significantly increases the probability of “chance meetings” happening virtually.

Deliberately making teams distributed

In almost every case where I’ve seen a distributed team fail, it was down to a lack of distributedness. Having only one or two people in a team of 10 working from a remote location, does not constitute a distributed team. If you want to increase the chances of success, make the majority of the team remote. In doing so you cause a cultural shift where everybody has to learn how to communicate and work in this mode. You won’t achieve this by having a single person be a remote engineer.

At Etsy we have some teams where many or most people are remote. A happy side effect of this, was the tendency of the remote engineers to band together, create their own IRC channel and develop social relationships across team boundaries. This happened quite naturally and was almost surprising.

Travel

I said early on that I agree with some of Paul’s points, and I truly do. He’s absolutely right that being in the same room as others fosters something special. There are two clear cases I can recall in the last few years when being in the same room as my teammates has caused magic to happen:

  • After hearing Joshua Hoffman speak at VelocityConf about Collins, a tool they were developing to manage the infrastructure and provision hosts more easily, several of us got together, rented a whiteboard from the hotel, planned and started coding. We each went home, and still working together knocked out a working prototype for our own such system within a week. It was truly a special moment.
  • In late 2011 our Search team was having trouble replicating very large Solr indices across many machines, it simply took too long. One evening in our New York office, I suggested (somewhat off-hand) to my colleague Laurie Denness, that we could use BitTorrent instead. Within moments the sparks were flying as gears shifted and magic was ignited. Our search team later wrote up how the final solution was implemented for the world to share.

My point here isn’t that magic can only happen in person, rather that magic doesn’t happen all the time. We cultivate the magic with frequent travel. I travel to our offices once a month, others travel one per quarter. On each trip, some amount of magic does happen. In between those times, we have a lot of other work to do. We still manage to make magic happen when we aren’t in the same room.

If anything, I would contend that magic happens more often after work, over food and drinks, or during casual conversation. Regularly traveling to meet your peers more than achieves that, and leaves the rest of your time free to focus on implementing the magic you have brought forth.

Balance

And this is the real point I want to make here to Paul and others who are advocating various solutions: Everything needs balance. Not everyone is cut out to be part of a distributed team, and not everyone works best in an office. A great many factors come in to this. But at the same time, closing the door on hiring remote engineers, not focusing on developing solid cultures that foster growth, learning and work irrespective of location hurts companies, individuals and the economy at large.

One data point which I find very interesting: More than half of the senior engineers at Etsy, are remote employees. These are your highly-influencial people, who impact the work of others around them, and bring other engineers up with them across the company. And many of them do it while not being in the same office.

In closing…

Paul states that 95% of the best programmers must be outside the US as 95% of the world’s population is also outside the US, and that “exceptional programmers have an aptitude for and interest in programming that is not merely the product of training.”

You’re right, Paul. Their aptitude isn’t merely the product of training. It is the product of learning, culture and growth. These aren’t things we teach to programmers, rather they are environments in which we enable them to flourish.
I don’t believe we do nearly enough to find and bring forward the people who could be the best programmers here. Our education system is ill equipped to develop skills like critical thinking (with some arguing that I cannot even be taught!), that are necessary to growing more “10X” people. Our business schools don’t teach managers to create Just Cultures and the environments we need to cultivate more expectional programmers..

There is a whole lot we can do, but it isn’t necessarily easy. Flooding the market is equally not the answer – there is no test for what an exceptional programmer looks like, so it’s nearly impossible to legislate around. The result would be opening the market to many many programmers, driving down wages, and potentially increasing unemployment, and increasing difficultly in hiring people – 1000 resumes/CVs are harder to go through than 100. As a corollary to Parkinson’s Law: The number of companies hiring exceptional programmers increases with the number of exceptional programmers. We will never have enough.

If you truly believe there is a talent shortage (in the US or otherwise), or a shortage of exceptionally talented people, I direct you to Andrew Clay Shafer’s Velocity NY 2013 talk There Is No Talent Shortage. Andrew talks deeply about culture, Nash Equilibrium’s, and Organisational Learning. Our organisational cultures are what develop those exceptional “10X” programmers, and we can make more, and they can be anywhere in the world. Doing this industry-wide will take a significant cultural shift to get enough people on board. Ultimately, this approach will have far greater pay offs than changing the immigration policies of a country.

The future is here, it is learning organisations and distributed teams.

On-call with Google Hangouts

Over the last 15 years I’ve constantly found myself part of an on-call rotation where ever I have worked. It’s par for the course as an Operations Engineer.
For many years I carried a Blackberry with me for two reasons:
* Mostly reliable message delivery
* Highly customizable notifications

Message delivery reliability has increased well across the board, so I don’t feel that Blackberry has any special advantage here any more.
Customizable notifications was the real must-have feature. You could have SMS notifications repeat constantly until answered, or every minute or every few minutes. They could start out on vibrate, followed by quiet sounds, and then louder sounds.

It was, in a word, perfect.

With those days behind us, I’ve regularly looked for a modern system which could emulate the old Blackberry functionality. For the last year and a half I’ve been using Chomp SMS on Android because if has a crude repeat-notifications option. It kind-of sort-of works.

When Google Hangouts fully enabled support for Google Voice I wanted to switch to using that, but Hangouts has no repeat notification functionality, so if you miss a page, you’re screwed.
Enter Repeating Notifications, a very cool app that lets you customize repeat notifications for *any* app.

It’s also incredible simple to use: Install it, and in the settings enable repeat notifications for Google Hangouts. Then tap Google Hangouts, and choose the repeat interval. You can also choose how you want repeat notifications to be silenced: When you turn the screen on, when you unlock the phone, or when you open Google Hangouts.

I tried this with my last on-call rotation, with notification repeating set to 30 seconds. It worked like a charm.

Centos 6.3 DomU on a Debian Dom0

Warning: I recently went through this process, and took notes, but a few pieces here are from memory. If things don’t work as expected, please let me know and I’d be happy to help.

Background

I’ve been running Debian for a long time – probably steadily about 6 years now. Before that I was a BSD fan, so many things about Debian appealed to me.

Over time my needs have changed. Much of my work involves using CentOS, and I quite like the ease of managing it. Debian is still nice, but it always had a few things I wasn’t a fan of, such as split (conf.d style) configurations. No disrespect to Debian, it’s a solid distro, the style of distribution I’m drifting to is purely personal preference.

My existing systems are all Debian VMs which live in a Debian Xen Dom0.

Objectives

Given our Debian Dom0, we want to install a CentOS 6 DomU.
The DomU must have XFS on the filesystem, and load it’s own recent kernel rather than the one supplied by the Debian Dom0.

Problems and solutions

  • Installing a CentOS (or RedHat, or Fedora) client is non-trivial. Debian has debootstrap which bootstraps the initial OS on the filesystem. We will use mock for this.
  • We need a newer kernel. CentOS 6 doesn’t come with Xen enabled kernels since RedHat pulled Xen support from their kernels, so we need a 3rd party kernel, or to build our own.
  • We need to load the new kernel. In order to do this we need to use pygrub which comes with Xen.
  • pygrub doesn’t understand reading XFS filesystems, so we need a separate /boot partition which it can read the grub menu.

The right tools for the job

To make this work, we need the following tools:

  • pygrub (comes with Xen)
  • mock (apt-get install mock)

Xen and LVM and filesystems

This particular setup uses LVM to create filesystems for each virtual machine. Many moons ago I found this to be more performant and cause fewer issues than file-backed filesystems. However that isn’t the case these days. My examples may show LVM partitions being used but this isn’t necessary.

Additionally, I wanted to start using XFS on the new system. After 6 hours of trying to make it work, I realised that pygrub doesn’t yet support reading XFS filesystems.
If you want to use XFS on your root filesystem, you’ll need an ext2 or ext3 /boot partition which pygrub will boot from.

Instructions

Basic VM initialisation

Create two filesystems which will be used by the new VM. My LVM volume group is called xen-vol. Yours will probably be called something else. If you want to use file-backed filesystems, you can do that instead at this point. Here’s the LVM way:

$ lvcreate -n newvm-boot -L 200M xen-vol
$ lvcreate -n newvm-root -L 100G xen-vol
mkfs.ext2 /dev/xen-vol/newvm-boot
mkfs.xfs /dev/xen-vol/newvm-root

Configuring Mock

mock is the tool we use to create the bare VM. Normally it is used to create a chroot environment for tasks like building packages. However it works really well for creating a basic OS layout which can then be booted.

By default it doesn’t come with a configuration file for CentOS 6, so create the file /etc/mock/centos-6-i386.cfg and populate it with this:

[base]
name=base
baseurl=http://mirror.centos.org/centos/6/os/i386/
gpgcheck=0

[update]
name=updates
baseurl=http://mirror.centos.org/centos/6/updates/i386/
gpgcheck=0

[buildsys]
name=buildsys
baseurl=http://dev.centos.org/centos/buildsys/6/
gpgcheck=0

"""

config_opts['macros']['%dist']=".el6.centos"
config_opts['macros']['%centos_ver']="6"
config_opts['macros']['%rhel']="6"

Populating the VM

Now we’ll mount and populate the VM using mock.

mkdir -p /mnt/newvm/boot
mount /dev/xen-vol/newvm-root /mnt/newvm
mount /dev/xen-vol/newvm-boot /mnt/newvm/boot
mock --init -r centos-6-i386 /mnt/newvm

mock will now set up a mostly complete chroot.

Configure the VM

For each of the following steps, you will need to be chrooted into /mnt/newvm first:

chroot /mnt/newvm /bin/bash

Install and configure Grub

Grub is the bootloader we use, so install that:

yum install -y grub

Now you’ll need a /boot/grub/grub.conf file. Create it and make it look like this:

default=0
timeout=2

title CentOS 6.3 i686
  root (hd1,0)
  kernel /vmlinuz-3.6.7-1.el6xen.i686 ro root=/dev/xvdb crashkernel=auto noquiet rdshell
  initrd /initramfs-3.6.7-1.el6xen.i686.img

Now make a new /boot/grub/device.map file which looks like this:

(hd0)   /dev/xvda
(hd1)   /dev/xvdb

The astute readers will notice that we just told Grub to boot from a kernel which doesn’t exist! Let’s do that next.

Install the new kernel

One of the beautiful things about Linux is the way people love to help each other. In our case, someone has built the kernel we need to run CentOS 6 as a Xen DomU client. To install it do:

yum install http://au1.mirror.crc.id.au/repo/kernel-xen-release-6-4.noarch.rpm
yum install kernel-xen

Final configurations

You’ll need an /etc/fstab file, like this:

/dev/xvda           /boot               ext3    noatime         1 1
/dev/xvdb           /                   xfs     noatime         1 1
tmpfs               /dev/shm            tmpfs   defaults        0 0
devpts              /dev/pts            devpts  gid=5,mode=620  0 0
sysfs               /sys                sysfs   defaults        0 0
proc                /proc               proc    defaults        0 0

And set the root password:

$ passwd root

Xen configuration

Leave the chroot and return to the Dom0 shell.
You will need a Xen configuration file for this VM, which should look something like this:

bootloader = '/usr/bin/pygrub'
extra      = 'console=hvc0 xencons=tty'
memory     = '256'
vcpus      = 2
name       = "newvm.example.com"
hostname   = "newvm.example.com"
vif        = [ 'ip=<ip address>' ]
netmask    = '<netmask>'
gateway    = '<gateway address>'
ip         = '<ip address>'
broadcast  = '<broadcast address>'
disk       = [ 'phy:xen-vol/newvm-boot,xvda,w', 'phy:xen-vol/newvm,xvdb,w' ]

Boot the VM

Umount the partitions andstart the new VM:

umount /mnt/newvm/boot
umount /mnt/newvm
xm create -c

How to secure your facebook account

Something I never hear people ask me:

“Av, how do I secure my Facebook account?”

Which leads me to believe one of two things:

  1. Everyone knows how to secure their Facebook account
  2. People like myself have done a very poor job or explaining to you why you need to secure your Facebook account, and how to do it

Obviously I feel it’s the latter, and now I intend to correct it. Here are some very simple reasons for why you need to take 10 minutes to secure your Facebook account and how to do it.

Why you should do it

There are two big reasons why you should continue reading this post.

First of all, you know those fun games you like to play?
And those surveys you filled out a while ago about food you like to eat?
Or perhaps that list of countries around the world you’ve visited?

Every time you did that, you authorised someone (often we don’t know who!) to some or all of the following:

  1. Post to your wall
  2. Post things your friends will see
  3. Know who your friends are
  4. Read your status updates, and know everything you say

For some things, this isn’t a big deal. For example, you WANT the FourSquare app to post to your wall.

But do you really want the Crap I Like To Eat survey posting things on your wall 6 months after you forgot about it? Probably not.

Secondly, you’ve either had someone hack into your Facebook account, or you know someone it has happened to (even if they don’t admit it). Perhaps you’ve had someone message you with a scam message on Facebook? It’s surprising common, and doing things to help prevent it is really easy too.

How to prevent problems

This is really easy, takes about 10 minutes at the most, and there are only two steps!

De-authorize those apps!

Go here: https://www.facebook.com/settings?tab=applications and click the “X” next to every app you don’t use any more. Now those apps can’t see your Facebook data!

Two-factor authentication

Two-factor authentication means that when someone tries to log in to your Facebook account, Facebook will a text message to your phone with a code. That code needs to be entered on the website before someone can log in.

If this sounds complicated, I promise it’s not.
Facebook remembers which computers you use, and only needs you to enter this code when you use a new computer (or if you “log out” on your existing computer). This means that for most people you will only need to get a text from Facebook once in a blue moon. But it also means that no-one can log in to your account, without you knowing. Here’s what you do:

Go to your Facebook security page and click Edit next to Login Notifications. Check either email, text message, or both – however you want to be told someone logged into your account. I use both just in case. Then click Save changes.

Next click Edit next to Login approvals and walk through the steps to set up the approval process I described above. If you have a smartphone (like an iPhone or Android), you can optionally install an app to make those codes for you, so Facebook doesn’t have to send you a text. This is completely optional, and up to you if you want it.

That’s it!

Pretty painless overall! There are other things you can do on that security page, like see all the places you’re logged in to Facebook from right now. You may see your home computer, your work computer (naughty, naughty!), your phone, and maybe even something that shouldn’t be there! Facebook has made it really easy to take care of your account on that one page!

Speed up your android tablet

After using several mid-range and entry level Android tablets, I’ve become familiar with one striking problem they all seem to suffer from: very slow internal storage.
This problem is especially noticeable on devices such as (some of?) the Coby tablets which have only one CPU core – when applications block on disk I/O, the entire tablet freezes.

A quick fix for this seems to be to add a Micro SD card, and to move all of your applications to it.

The catch: The normal “move app to SD card” feature in Android doesn’t always move the apps to your external SD card. In fact it seems to almost never do that for me. Instead the apps are moved to a separate part of the internal storage.

A great app called Link2SD takes care of that. There’s a small amount of setup you have to do first, by plugging the microSD card into your computer, and then creating two partitions on the card. There are some instructions on the Google Play page for how to do this.

After moving my browser and all of my apps to a new SD card, I’m finding that my tablet is very rarely slow. Web pages open significantly more quickly and the UI has yet to hang even once.

I installed a cheap 4Gb class 4 SanDisk microSD card. I was hoping to find a faster one but my local stores don’t carry anything other than class 4 cards. Turns out, this was just fine!

Rooting and installing Gapps on Coby 1042 tablet from Linux/OSX

Vacation time is great!

I went into Fry’s today and found they have the Coby 1042 (10.1″, Android 4.0.3) on sale. Normally I don’t pay much attention to the lower end tablets because they miss one or more features I think are pretty critical in a tablet:

  • Capacitive display
  • Fast CPUs
  • SD card slots
  • Ice Cream Sandwich
  • The ability to easily mod/root/extend

The Coby device, however, meets all of these requirements!
And it comes with an HDMI port, full USB port to attach a keyboard and mouse, and a nice soft neoprene sleeve too.

Rooting

Requirements
  • Install the Android SDK to get the adb binary
  • Download and decompress coby_root3.zip(I have a backup copy if this link stops working)

Rooting the device is beyond trivial.
In the coby_root3 directory is a data directory, which you need to push to your device and then run a few

adb

commands::

$> cd coby_root3
$> adb push data /data/local/tmp
$> adb shell chmod 0777 /data/local/tmp/mempodroid
$> adb shell /data/local/tmp/mempodroid 0xd7cc 0xad27 sh /data/local/tmp/root.sh

Congratulations! You now have root, and the Superuser app is installed!

Installing Google apps

I tried a variety of things to get Gapps working, most of which just didn’t work. I did manage to get things working with this relatively simple process though.

Go to http://goo.im/gapps/ and download and decompress the gapps-ics-20120429-signed.zip file.
Now copy all of the apk files in there to your tablet:

$> cd gapps-ics-20120429-signed
$> adb push system/app /data/local/tmp
$> adb shell 
adb> su -
adb> mount -o remount,rw /system
adb> busybox cp /data/local/tmp/app/* /system/app
adb> exit

Reboot the tablet.

This was enough to get the Google Play store working and I could install Gmail and other apps.

If you have problems, shoot a message and I’ll be happy to try and help.

Root Bionic / Razr with Ice Cream Sandwich from OSX

Dan Rosenberg (@djrbliss), a security researcher, recently published an exploit for Motorola RAZR devices running Ice Cream Sandwich which allows you to gain root access.

Ice Cream Sandwich is about to be released for the Motorola Bionic, and fortunately Dan’s vulnerability exploit works for the Bionic also.
Unfortunately, it only run on MS Windows, so I modified his script into a bash script for running in Linux / OSX, which you can download run.sh, here.

You need to download Dan’s code first and unzip it, and then download this run.sh into the same directory.
The script requires to you have the android-sdk installed locally so you can get the adb binary. Dan packages that up in his code for Windows but I don’t have it available for download here. Once you have it, make sure the path to the adb binary is correct in run.sh

Once those steps are complete, just run: bash run.sh and it should just work!

Posts navigation

1 2 3 4 5