Ops 101: Change Control

Intro

A while back I decided I wanted to put down on paper some of the lessons I learned working as a systems admin and having worked in an enterprise environment with thousands of servers. Some lessons I learned in the small shops, some I learned at bigger shops. This is going to be the fist in my “Ops 101” series, and a more broad series of essays about lessons I’ve learned.

The Processes

Process. Procedures. We all hate doing them. Without them, however, all the other stuff never gets done. Sure ? you?ll start off documenting the changes you make, and checking in that script, but unless there?s a process in place and accountability to back that process up, a couple of months down the line you?re going to be asking, ?wait, where does that script run again?? and ?that was upgraded? Since when?!? — without good process, you?ll lose all of those good habits that you?ll thank god you have when things start to break.

Change Management

Change management, at is actually a very simple thing: What. When. Why. Where. How. Specifically: What do we want to change? Why do we want to change it? When do we want to make this change? Why do we want to make this change? and Where do we want to make it. It?s that simple. The hard part is getting people to do it. The benefits are huge however. I can?t count the number of times we?ve found an issue, started tracking it back, dug through the logs and found that it happened yesterday at three pm. Four out of five times you can go back and see that some app was deployed the previous day at about the same time, now the problem is easy to solve. In addition, no matter how good the documentation you?re going to get from dev is (and lets face it, in services… usually it ain?t so good) it is worth all the hassle in the world to be able to be able to look up how you did something a year ago when it randomly comes up again.  

Change management basics

The following is all the things you will need to have a successful change management system:

  • A meeting. Sorry: you?ll just have to live with it. 30 minutes a day won?t kill you
  • A ticketing system
  • Signoff from the ops team that they will use it 100% of the time
  • Signoff from the rest of the company that they will only escalate things using the ticketing system

The changes

Changes fall into three basic buckets: Emergency Change Requests (ECR), (Scheduled) Change Requests (CR), and Standard Operating Procedures (SOP). These should be fairly self explanatory: ECRs happen in emergencies. If an army of zombies breaks into the office, your first thought should rightly be: how do I deal with the zombies. Afterwards, providing you live, you would create an ECR. This will enable the next guy to do a quick search for ?zombies? in the ticketing system and see that there is an emergency shotgun hidden behind the UPS in the server room, thus not having to go through the harrowing ordeal of sacrificing all of those sales guys before remembering it was there. He could, instead, just sacrifice *some* of the sales guys.

CRs will come up a lot of different ways. Client Services will request things. Deployments will need to be done. Sysadmins will think of better ways to do things. Change happens. Anything you don?t need to do *right now* goes into the CR bucket. You look at these in your change meeting, decide when and if they should be executed and if the process of the change can be improved. Then you dole them out to your various system admins to do the actual work.

SOPs are the basic ?can you run that script that you put together that fixes the mailserver again? type stuff. You do it often enough that it?s ?no big deal?. The ticket is just there a) for tracking purposes b) when that guy who runs the script is out, so you can look it up. These will eventually be a good chunk of the stuff in your wiki, but more on that later.
The Change Review Meeting

ECRs generally get phone approval from someone in management and then are documented afterward. They generally are followed by a meeting to explain what the heck happened, and a formal root cause analysis.

The agenda is simple: is it approved? Who?s doing it? When do we want to do it? Next. This should be a quick meeting. It won?t be. Who should be there: Senior Systems Admin, Director of Operations and a representative from the other teams. Dev & CS, at least should have a seat at the table, others are probably optional.

The Software

There are many out there and as you grow you might want to look at purchasing a commercial ticketing system or making modifications to your existing one, but given the pricetag of free and how widely used it is, I?d recommend RTi from bestpractical.com. It?s simple, it?s good and it works.

 

RPM Install page ii Current version-release: 3.4.5-2

Summary: This aims to be the solution to an easy RPM install of RT on RHEL4/CentOS4. /(Although this packages have been reported to run under Fedora Core 4, seems that they have they own now see section bellow)/

Download:

http://campus.fct.unl.pt/paulomatos/rt/repository/

Old releases: rt-3.0.10-3 still available under 3.0.x directory.

WARNING: This packages were built on the assumption that SELinux is turned off (*Any help on make it support both modes would be great!!!*).
Package Description

rt

It was built with mysql and apache2/modperl2 (2.0.1), it has no patches at the moment, but might have to correct known problems, to see details, at any moment do:

rpm -qp –changelog rt-<version>-<release>.noarch.rpm

 

rt-mail-dispatcher  This is a setup for a RT mail dispatcher using sendmail and procmail. It is based on the assumption that you use one domain for all your RT queues, e.g. @rt.yourdomain.com.
This allows you to setup queues in RT, using the following convention syntax:

correspondence
address: queuename@rt.yourdomain.com

comment
address: queuename-comment@rt.yourdomain.com

 

without having to reconfigure everytime your mail settings.
‘postmaster’ is reserved to be RFC822 compliant, and should be setup correctly, defaults to user postmaster. You can always change it to be a RT queue as well.

Installation Notes

With [yum http://linux.duke.edu/projects/yum/download.ptml]

RT’s three step install procedure:

  1. Download the file: http://campus.fct.unl.pt/paulomatos/rt/repository/3.4.x/rt-3.4.x.repo
  2. Copy it to /etc/yum.repos.d/ or

cat
rt-3.4.x.repo >> /etc/yum.conf

or

cd
/etc/yum.repos.d/

wget
http://campus.fct.unl.pt/paulomatos/rt/repository/3.4.x/rt-3.4.x.repo

 

3. Then type, as ‘root’:

yum
install rt rt-mail-dispatcher

 

You’ll have rt installed in no time… then all you have to do is configure a few settings as the messages suggest.  

Note: Depending upon which Perl modules you had installed in the past, you may have to update before installing via yum. If a whole lot of dependency errors display when you run yum install, then type the following:

yum
update

yum
install rt rt-mail-dispatcher

        Without yum

          Just download everything to a directory and do:

rpm -Uvh *.rpm

 

Post Installation
Notes

A user pointed me
out that he was in such a hurry to try it out he lost the messages
that appeared after install. He also suggested I created a file with
those messages inside. Meanwhile here they are:

  • rt

cp
/etc/rt/RT_Config.pm /etc/rt/RT_SiteConfig.pm

 

to
generate an editable site config file.

 

You
must now configure RT by editing /etc/rt/RT_SiteConfig.pm and

/etc/httpd/conf.d/rt.conf.

 

(You
will definitely need to set RT’s database password before continuing.

Not doing so could be very
dangerous)

 

After
that, you need to initialize RT’s database by running

 

/usr/sbin/rt-setup-database
–action init

–dba root
–prompt-for-dba-password

 

If
something goes wrong you can always drop everything, by executing

 

/usr/sbin/rt-setup-database
–action drop

–dba root
–prompt-for-dba-password

  • rt-mail-dispatcher

You
must now configure somethings by editing /var/rt/home/.procmailrc,

please
read /usr/share/doc/rt-3.4.5/README.mail-dispatcher.

 

LINKS:

i – http://bestpractical.com/

ii –http://wiki.bestpractical.com/view/RPMInstall

This entry was posted in ops 101, tech, Uncategorized and tagged , , , , , , . Bookmark the permalink.

Leave a Reply