« Posts tagged unix

Run a huge query as fast and safely as possible

Use this as a last resort

Queries that take a long time are generally a bad thing. If your application requires these kinds of measures to perform its duties, then chances are you really need to revise your table structures and/or you queries – ideally these queries should take seconds at the most, while data warehouse type reporting queries should be on the order of minutes. That said, sometimes you may need to update your entire schema, delete columns on a table with millions of records, or run a stored proc that goes in and cleans up data across several sets of tables across untold numbers of rows. If you try to run it from putty, or any other remote terminal and anything happens that might sever your connection, you might end up SOL with a rolled back exception that would leave you exactly where you started – with no data updated. These are some strategies you can use to mitigate the risk and cut down on the query time.

Try different strategies

Consider running a proc that pulls one million records, and then updates each record individually – you might want to get some popcorn since that update might take a while. That kind of update is a linear approach and generally bad because it will need to sequentially go through each record one at a time. Divide and conquer might work better – you could try batch updates across segments of the table where indexes are used – something like:

update table set column = value where constraint = 'arbitrary value';
update table set column = otherValue where constraint = 'some other value';

Another approach could be to reconstruct the table using the data from your target table, while filtering out or substituting in hardcoded values for the data you want to replace:

insert into clone_table
select primary_key, column, now() as activated_date, 
	other_column, true as is_active
from table 
where status = 'active'

You could use this approach to reconstruct your table with the data you want and then swap table references on the foreign keys. That part might get a little tricky but if you do it right, using select insert could end up saving you quite a bit of time – select inserts could take minutes while updates could take orders of magnitude longer.

Use screen to wrap your remote session

If your database is running on unix, without a doubt you’ll want to use screen if you need to run a very long query. If your database is on linux, I’m not sure there’s an equivalent. Anyone that’s used putty or some other terminal type of remote console app knows what it’s like to have some long running process terminate prematurely because the connection was severed, or your computer crashed. Screen saves you from those infrequent occurrences by creating an emulated session that can be detached/re-attached such that if you do get disconnected, you can go back and pick up where you left off. It’s very handy for executing a long running process where a disconnect would either cancel the proc and normally terminate the session.

To invoke screen, just type the word screen into the command prompt:

[root@bedrock ~]# screen

This will start your screen session. This may or may not provide some relevant information at the bottom of the screen like in the example below depending on your flavor of unix or configuration:

[root@bedrock ~]#

[ bedrock ][ (0*bash) ][2011-09-09 21:57 ]

Now that screen is up, you can disconnect your terminal app without fear that your screen session would terminate prematurely. You can then log back into the unix box and get a listing of all the current screen sessions with the following command:

[root@bedrock ~]# screen -ls
There are screens on:
     27470.pts-0.bedrock (Attached)
     8177.pts-0.bedrock (Detached)
     mySessionName (Detached)
3 Sockets in /var/run/screen/S-agonzalez.

I should point out that the session name is organized like [processId.sessionName]. You can name your session upon creation with the following command:

[root@bedrock ~]# screen -S yourSessionName

Once you’ve found the right screen session (they’re listed by session name) you can re-attach your severed session with the following command:

[root@bedrock ~]# screen -r mySessionName
There are screens on:
27470.pts-0.bedrock (Attached)
8177.pts-0.bedrock (Detached)
2 Sockets in /var/run/screen/S-agonzalez.

Once you’re in screen it’s useful to know a few keyboard commands to get around:

Control+c, pause then Control+d Detaches your session without termination
Control+c, then h screen capture, and save to your home directory as hardcopy.x (x being the number)
Control+c, then C (capital c) clear the screen of text
Control+c, then N (capital n) display information about the current screen window
Control+c, then ? help screen!

You can find more command, options and details at this screen manpage.

Run your query through a local pipe

If your query pulls back a lot of data, its going to require bandwidth to pipe it all back to your remote client. Don’t use remote clients (lik pgAdmin, MySQL workbench, SQuirreL etc) unless you’re running them directly on the box that’s running your database. Connect remotely and log in through a local pipe however you’re supposed to connect to the local command line:

[root@bedrock ~]# psql -l username my_pg_database
Welcome to psql 8.1.21, the PostgreSQL interactive terminal.

Type: copyright for distribution terms
h for help with SQL commands
? for help with psql commands
g or terminate with semicolon to execute query
q to quit


You would be amazed how much faster a query runs when you’re running it directly on the machine. To give you an idea – running an update across 2 million rows might take an hour if you’re running from a remote client, while running it directly on the box might take mere minutes. We’re talking orders of magnitude of performance – for most development remote is perfectly fine, but for heavy lifting you can’t beat a local pipe.

Now you can run your query… If you’re running on a local pipe and you’re running on screen, you should be able to sever your screen connection without terminating your super long query. Let’s hope that query doesn’t table lock everything to kingdom come!

Configure ssh authorized keys for cvs access

Continous Integration

Lately I’ve been working on adding Hudson as the continuous integration (CI) server for projects at work. The whole notion of CI merits an entire discussion, but suffice to say it’s a very clean, approach that helps automate the build process particularly if you run manual builds that use prompted shell scripts.

After looking at a few solutions, Hudson seemed from many accounts to be the easiest to get running, and pretty flexible when integrating into an existing build system. Add to the resume that it could run in a servlet container, divorcing environmental configuration from its automated build functionality, and we suddenly have a winner.

I went to work setting up integration build scripts and projects and all kinds or cool plugins when I finally hit a wall when it came time to wire up Hudson with cvs access. As it turns out, in our particular setup we access cvs via ssh, and ssh will usually require a password in order to connect to a remote host. When automating builds, this can be quite problematic since it seems that part of the argument is to allow the builds to fire off without interactive human intervention. I noticed that prompted passwords are very capable of raining that parade out.

I dug around for what seemed like forever, until it seemed that the solution was to enable authorized key access via ssh, and configure the generated public key to not require a pass phrase. In a nutshell, you can set up a public and private key, and configure it to require a pass phrase or not when requesting access. You then copy that public key to the remote machines you want to enable access to into the correct location. The last step is to configure authorized key access via ssh on the remote machine. Only then will you be able to ssh to the remote machine with the public key and without a password or pass phrase – in essance that public key becomes trusted authentication.

Here are the steps, with more detail:

Configure your connect-from machine

Let’s assume you’re going to use an account called builder for this example. In your shell as builder, cd into ~/.ssh and run:

ssh-keygen -f identity -C ‘buildier identiy cvs key’ -N ” -t rsa -q

This will create the set of keys for you without a pass phrase. The -C flag sets the comment tagged at the end of the key. You want to end up with a file structure like this:

[builder@connectFrom.ssh]# ls -l iden*
-rw——- 1 jboss CodeDeploy 1675 Dec 5 09:54 identity
-rw-r–r– 1 jboss CodeDeploy 405 Dec 5 09:54 identity.pub

on your connect-from machine. You will need to chmod the user’s home and .ssh directories to permission 0700. It turns out that these folder permissions are very picky and these keys will not work if the group or others have read/write access to that .ssh directory or its contents.

Configure your connect-to machine

You will now want to again create a ~/.ssh directory, also with permissions set to 0700 on the connect-to machine. Then use your favorite text editor to create the file: ~/.ssh/authorization_keys. This one’s even more strict – ensure that ~/.ssh/authorized_keys permissions is set to 0600. Paste the contents of your connect-from machine’s file ~/.ssh/identity.pub into this authorized_keys file contents. This step essentially copies the public key over as an authorized key to the remote machine. The file authorized_keys should have only one key per line, or it will cause problems. Lastly, we’ll need to make sure that the flag PubkeyAuthentication is enabled on the connect-to machine and that it it reads in the correct authorized_keys file.

Edit the file /etc/ssh/sshd_config file and uncomment the following:

PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys

Now you should be able to test the ssh connection with debugging enabled by saying form the connect-from machine’s shell:

[builder@connectFrom.ssh]ssh -v builder@connectTo

You should see connection information useful for debugging – looking for something like this:

debug1: Next authentication method: publickey
debug1: Offering public key: /home/builder/.ssh/identity
debug1: Server accepts key: pkalg ssh-rsa blen 277
debug1: read PEM private key done: type RSA
debug1: Authentication succeeded (publickey).
debug1: channel 0: new [client-session]
debug1: Entering interactive session.
debug1: Sending environment.
debug1: Sending env LANG = en_US.UTF-8
Last login: Thu Dec 2 01:17:40 2010 from connectFrom

Configure Hudson to use the external ssh
Now that these authorized keys have been configured for use, you can go into Hudson and set up the cvs connection string. You will need to make sure that the cvs advanced configuration is set to :

$CVS_RSH: ssh

And you should be all set.

Your builder account should now be able to access the remote machine using the trusted authorized keys.

How to allow SSH host keys on Linux (Fedora 10 & CentOS 5.2)
ssh – authorized_keys HOWTO
2.4.1 Connecting with rsh and ssh

*nix commands I can’t do without

Unix/Linux/*nix survival 101

Let me start with the obvious: I’m definitely not a unix guru by any means. I do however use it on a daily basis for basic build/development oriented tasks, so I know enough to get by. Since my friend just installed his first ever linux distribution (CentOS, Huzzah!), I thought I’d write something up on some common unix commands that help me get through the day.

grep [command flags] [search text] [filename]

grep (global | regular expression | print) is the file text search command. Give it a regular expression and it will print out what it finds in the file indicated by filename. Here’s an example:

[root@bedrock some_jboss_folder]$ grep html readme.html
<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN”>
<meta content=”text/html” http-equiv=”content-type”>
<a href=”http://docs.jboss.org/html”>here</a>.</li>
<li><a href=”http://www.jboss.org/index.html?module=bb”>JBoss
Server  is licensed under the <a href=”lgpl.html”>LGPL</a>,

Some useful flags include -R (recurse into sub directories), -c (show just the total match count), -m NUM (return the NUM number of results), and -i (ignore upper/lower case).

ps aux | grep [search text]

This is a command you can use to get information about what processes the kernel is currently running. Adding the pipe after the ps command feeds the listing results to the grep search command. This is particularly useful when you want to look for a specific set of procs run by a user or script. Here’s an example:

[root@bedrock ~]$ ps aux | grep jboss
jboss 10910 0.0 0.1 4884 1176 ? S Feb04 0:00 /bin/sh /server/jboss/bin/run.sh -c services -b -Djava.net.preferIPv4Stack=true
jboss 10932 0.2 36.4 1089728 371456 ? Sl Feb04 31:43 java -Dprogram.name=run.sh -Xms128m -Xmx512m -XX:MaxPermSize=256m -Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djava.net.preferIPv4Stack=true -Djava.endorsed.dirs=/server/jboss/lib/endorsed -classpath /server/jboss/bin/run.jar org.jboss.Main -c services -b -Djava.net.preferIPv4Stack=true
500 20300 0.0 0.0 4200 700 pts/0 S+ 20:43 0:00 grep jboss

ps (process status) fetches a list of running pocsses. ax flags the command to return a listing of all procs. u flags to also list the user that the proc is running as. I use the grep to figure out if a jboss server is up and running, and sometimes to see what input parameters it used on startup – like what ip it bound to : “-b”. The results above lists first the user and process id, and then information about the proc.

netstat -ntalp | grep [search text]

This command must be run as root, but it lets you get a listing of network ports that are currently in use. This is particularly useful when trying to figure out port conflicts or to see if a particular server is listening on the correct port.

[root@bedrock ~]# netstat -ntalp | grep java
tcp 0 0* LISTEN 10932/java
tcp 0 0* LISTEN 10932/java
tcp 0 0* LISTEN 10932/java
tcp 0 0* LISTEN 10932/java
tcp 0 0* LISTEN 10932/java
tcp 0 0* LISTEN 10932/java

You can grep for port, ip/domain, status etc.

kill [signal flag] [process id]

This is the standard “kill process”, “terminate it dead” command. Usually when a proc refuses to shut down and all hell is breaking loose, and you can’t take no for an answer, signal flag “-9” will insta kill the proc. You can get the process id from the “ps aux | grep” command.

root@bedrock jboss]$ kill -9 10932

Here I took the process id from the jboss script that was running from the ps aux | grep command example listed above. Use ps aux to figure out which process id you want to terminate.

./run.sh [args]

This is the standard syntax for invoking a script, assuming you have run privileges. In windows you’d just type in the name of the script, but in unix you should prefix the script name with “./”.

As Dave Cheney explains in a comment:

    The reason you have to put “./” as a prefix to a script in your current working directory is the search path for executable programs does not (generally) include “.”
    To the shell, “.” expands to the current directory so ./run.sh is equivalent to /home/kevin/run.sh (for example). As you have provided a full absolute path, the shell will not have to try the prefixes available in your $PATH environment.

So essentially, by adding the “./” before the script name you feed the shell a fully qualified executable path to the script you want to run, that way it doesn’t have to guess where your script is. So if the script is named run.sh and your current working directory is in a folder named bin, you can invoke it like this:

root@bedrock bin]$ run.sh -c services -b -Djava.net.preferIPv4Stack=true

If your script takes parameters, you can pass them into the script after the script name.

tail [-f or -NUM] [path to file]

tail is a command that outputs the contents of a file to the terminal window. If you use the “-f” flag, it’ll continuously read the file as its contents grows. If you feed it a line count like “-1000” it will output the 1000 most recent line entries of the file. We’ll say something like – “Hey, I’m gonna tail the logs while the server starts up”. This means we’re monitoring the logs using this tail command. And knowing is half the battle.

Musk explains a better alternative that allow you to drop out of follow and search:

    You do not need tail use less +F or press Shift-F while in less and it will follow the currently choosen file if content is added.

    Example: log.txt

    less +F log.txt and you will have the same behaviour as when using tail -f log.txt except that you can use CTRL+C to drop out of follow mode and then use the search features available in less.

chown -R [group].[user]

This changes a file or directory’s owners to a new group/owner. The -R flag tell is to recurse the command into sub directories.

root@bedrock jboss]$ chown -R jboss.jboss

This command will work assuming there is a group and user named jboss, and it will change all files and folders in the current directory and lower to jboss.

chmod -R [permissions] [filename/expression]

This will set the permissions for the implied files to the new set of permissions listed. The mode can be indicated as either a string explanation of what each group can do or a 4 octal digit equivalent number.

[root@bedrock some_jboss_folder $ chmod ug=rwx,o=rw readme.html
[root@bedrock some_jboss_folder $ chmod 0775 readme.html

In the first example, we set the file owner (u) and group (g) to allow read (r), write (w) and execute (x). Then we set everyone else’s (o) permissions to read and write only, no execute. In the second example, we set it to 0775, which is the octal digit representation of the first command. 0777 will set read/write/exectue permissions allowed to everyone, its the same as ugo=rwx.

vi [filename]

Basic text editor *nix usually ships with. It will open up the indicated file in read mode, and if it doesn’t exist will let you create a new text file without saving to disk. To enter editor mode, hit the Insert key, you can then edit the file. After you make your edits, hit the Escape key to get into command line mode. If you want to save the file, enter “:w”. If you then want to quit, type in “:q”.

[root@bedrock some_jboss_folder ]$ vi readme.txt
<li>lib/ – the same
static library jars with a few jars, as most have moved to top level common/lib</li>
“readme.txtl” 718L, 36365C written

ping -c [ip/domain]

Pings an ip or domain with a packet of data. Unlike the windows cousin, you have to either pass in the number of times to ping (-c NUM) or hit control+c to stop pinging.

[root@bedrock some_jboss_folder ]$ ping -c 4 localhost
PING localhost.localdomain ( 56(84) bytes of data.
64 bytes from localhost.localdomain ( icmp_seq=1 ttl=64 time=0.040 ms
64 bytes from localhost.localdomain ( icmp_seq=2 ttl=64 time=0.046 ms
64 bytes from localhost.localdomain ( icmp_seq=3 ttl=64 time=0.033 ms
64 bytes from localhost.localdomain ( icmp_seq=4 ttl=64 time=0.048 ms

— localhost.localdomain ping statistics —
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.033/0.041/0.048/0.009 ms


This command takes over your terminal window and fills it with a listing of all the procs that are currently running, along with instruction crunching information. Hitting the < and > will scroll you through the results. q will quit top, returning you to the linux prompt. This is what it looks like:

top – 23:12:30 up 40 days, 16:38, 1 user, load average: 0.06, 0.02, 0.00
Tasks: 158 total, 1 running, 121 sleeping, 36 stopped, 0 zombie
Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1018232k total, 1003040k used, 15192k free, 138436k buffers
Swap: 2064376k total, 30236k used, 2034140k free, 312912k cached

2165 smmsp 20 0 9208 748 640 S 0.0 0.1 0:00.40 sendmail
1359 rpcuser 20 0 2988 560 556 S 0.0 0.1 0:00.03 rpc.statd
1346 rpc 20 0 2404 556 504 S 0.0 0.1 0:02.62 rpcbind
1 root 20 0 2012 624 560 S 0.0 0.1 0:04.71 init

man [command name]

If you need more detail on a specific command, you can get help from the unix manual by invoking man:

[root@bedrock ~]# man top
TOP(1) Linux Userâs Manual TOP(1)

top – display Linux tasks

top -hv | -bcHisS -d delay -n iterations -p pid [, pid …]

The traditional switches â-â and whitespace are optional.

The top program provides a dynamic real-time view of a running system. It can display system summary informa-
tion as well as a list of tasks currently being managed by the Linux kernel. The types of system summary
information shown and the types, order and size of information displayed for tasks are all user configurable
and that configuration can be made persistent across restarts.

ls [list flag] [path to directory]

This prints out a listing of the indicted directory’s contents, or the current directory if no path is supplied. -l lists one file/directory per line of output, and -a lists everything including files that start with a period.

[root@bedrock ~]# ls -la
total 168
drwxr-x—. 10 root root 4096 2009-11-20 23:09 .
drwxr-xr-x. 30 root root 4096 2010-02-07 22:32 ..
-rw——-. 1 root root 1675 2009-11-11 18:55 anaconda-ks.cfg
-rw——-. 1 root root 21354 2010-02-10 03:39 .bash_history
-rw-r–r–. 1 root root 18 2009-03-30 07:51 .bash_logout
-rw-r–r–. 1 root root 176 2009-03-30 07:51 .bash_profile
-rw-r–r–. 1 root root 176 2004-09-22 23:59 .bashrc
drwx——. 3 root root 4096 2009-11-12 02:17 .config
-rw-r–r–. 1 root root 100 2004-09-22 23:59 .cshrc
drwx——. 3 root root 4096 2009-11-11 19:06 .dbus

cat [filename1] [filename2] > [outputfile]

cat lets you concatenate and output the contents of a file of multiple files to the terminal window, or write it to a file if you include the “>” operator. Thanks Kevin. Here’s an example:


[root@bedrock ~]# vi test.txt
concatenate me!
this is a test


[root@bedrock ~]# vi concatenate.txt
a file that needs to be concatenated



[root@bedrock ~]# cat test.txt concatenate.txt > output.txt

The result

[root@bedrock ~]# more output.txt
concatenate me!
this is a test
a file that needs to be concatenated


sed -i ’s/[some_text/other_text]/’ [filename]/

sed – stream editor for filtering and transforming text (blatantly stolen from “man sed”‘s documentation). This command will replace “some_text” with “other_text” in the file indicated. One occurrence per line is replaced. Thanks for this one Silvery.

Consider the file “test.txt”:

[root@bedrock jboss]# more test.txt
this is a file
this ia another file
lets faceroll files

And this is what happens when we run sed on it:

[root@bedrock jboss]# sed -i ‘s/file/folder/’ test.txt
[root@bedrock jboss]# more test.txt
this is a folder
this ia another folder
lets faceroll folders
[root@bedrock jboss]#


more/less – enables you to view the contents of a file within a page on the screen. Once you are browsing the contents, you can hit “s” or “f” to scroll multiple lines of text. “v” will fire up an editor at the current line you’re working on. If you have a large list of files and want to check them one page at a time, you could try “ls | less”. Thanks again Kevin.


Clears the visible screen of text, starting your prompt at the top of the window.

mkdir [directory name]

This command simply creates a directory with default permissions and ownership.

cp -R [source] [destination directory]

Copies a file/folder from one location into another. -R flags to copy recursively.

mv [source] [destination directory]

Renames a source directory or folder to a new location/name.

rm -Rf [folder/file]

Deletes a file. -R flags to delete recursively. When invoked on a directory it would normally go line by line asking you if you want to delete such-and-such file, use the “f” flag to force delete and skip the file by file questions.

cd [path to new directory]

cd changes the current directory to the path indicated. A “..” means to move up one directory. If the path begins with “/” it means start from the disk root folder. Anything else implies a relative path to the new folder.

– -color=auto

In his comment, Ryan Fox points out:

    The `- -color=auto` option adds colour to the output of some commands, like ls or grep. In ls, the colours change depending on the file type, permissions, etc. In grep, it will highlight the text that matched your regex.

Here’s an example:

[root@bedrock jboss]# ps aux | grep jboss – -color=auto
root 5215 0.0 0.0 4200 712 pts/1 S+ 05:56 0:00 grep jboss –color=auto
jboss 10910 0.0 0.1 4884 1176 ? S Feb04 0:00 /bin/sh /server/jboss/bin/run.sh -c services -b

Open ended

I’m sure there must be other useful commands I have missed. If anyone has any other suggestions to add/edit these entries, please feel free to comment and I’ll update accordingly.