« Posts tagged terminal

Run a huge query as fast and safely as possible

Use this as a last resort

Queries that take a long time are generally a bad thing. If your application requires these kinds of measures to perform its duties, then chances are you really need to revise your table structures and/or you queries – ideally these queries should take seconds at the most, while data warehouse type reporting queries should be on the order of minutes. That said, sometimes you may need to update your entire schema, delete columns on a table with millions of records, or run a stored proc that goes in and cleans up data across several sets of tables across untold numbers of rows. If you try to run it from putty, or any other remote terminal and anything happens that might sever your connection, you might end up SOL with a rolled back exception that would leave you exactly where you started – with no data updated. These are some strategies you can use to mitigate the risk and cut down on the query time.

Try different strategies

Consider running a proc that pulls one million records, and then updates each record individually – you might want to get some popcorn since that update might take a while. That kind of update is a linear approach and generally bad because it will need to sequentially go through each record one at a time. Divide and conquer might work better – you could try batch updates across segments of the table where indexes are used – something like:

update table set column = value where constraint = 'arbitrary value';
update table set column = otherValue where constraint = 'some other value';

Another approach could be to reconstruct the table using the data from your target table, while filtering out or substituting in hardcoded values for the data you want to replace:

insert into clone_table
select primary_key, column, now() as activated_date, 
	other_column, true as is_active
from table 
where status = 'active'

You could use this approach to reconstruct your table with the data you want and then swap table references on the foreign keys. That part might get a little tricky but if you do it right, using select insert could end up saving you quite a bit of time – select inserts could take minutes while updates could take orders of magnitude longer.

Use screen to wrap your remote session

If your database is running on unix, without a doubt you’ll want to use screen if you need to run a very long query. If your database is on linux, I’m not sure there’s an equivalent. Anyone that’s used putty or some other terminal type of remote console app knows what it’s like to have some long running process terminate prematurely because the connection was severed, or your computer crashed. Screen saves you from those infrequent occurrences by creating an emulated session that can be detached/re-attached such that if you do get disconnected, you can go back and pick up where you left off. It’s very handy for executing a long running process where a disconnect would either cancel the proc and normally terminate the session.

To invoke screen, just type the word screen into the command prompt:

[root@bedrock ~]# screen

This will start your screen session. This may or may not provide some relevant information at the bottom of the screen like in the example below depending on your flavor of unix or configuration:

[root@bedrock ~]#

[ bedrock ][ (0*bash) ][2011-09-09 21:57 ]

Now that screen is up, you can disconnect your terminal app without fear that your screen session would terminate prematurely. You can then log back into the unix box and get a listing of all the current screen sessions with the following command:

[root@bedrock ~]# screen -ls
There are screens on:
     27470.pts-0.bedrock (Attached)
     8177.pts-0.bedrock (Detached)
     mySessionName (Detached)
3 Sockets in /var/run/screen/S-agonzalez.

I should point out that the session name is organized like [processId.sessionName]. You can name your session upon creation with the following command:

[root@bedrock ~]# screen -S yourSessionName

Once you’ve found the right screen session (they’re listed by session name) you can re-attach your severed session with the following command:

[root@bedrock ~]# screen -r mySessionName
There are screens on:
27470.pts-0.bedrock (Attached)
8177.pts-0.bedrock (Detached)
2 Sockets in /var/run/screen/S-agonzalez.

Once you’re in screen it’s useful to know a few keyboard commands to get around:

Control+c, pause then Control+d Detaches your session without termination
Control+c, then h screen capture, and save to your home directory as hardcopy.x (x being the number)
Control+c, then C (capital c) clear the screen of text
Control+c, then N (capital n) display information about the current screen window
Control+c, then ? help screen!

You can find more command, options and details at this screen manpage.

Run your query through a local pipe

If your query pulls back a lot of data, its going to require bandwidth to pipe it all back to your remote client. Don’t use remote clients (lik pgAdmin, MySQL workbench, SQuirreL etc) unless you’re running them directly on the box that’s running your database. Connect remotely and log in through a local pipe however you’re supposed to connect to the local command line:

[root@bedrock ~]# psql -l username my_pg_database
Welcome to psql 8.1.21, the PostgreSQL interactive terminal.

Type: copyright for distribution terms
h for help with SQL commands
? for help with psql commands
g or terminate with semicolon to execute query
q to quit

ml_publisher=#

You would be amazed how much faster a query runs when you’re running it directly on the machine. To give you an idea – running an update across 2 million rows might take an hour if you’re running from a remote client, while running it directly on the box might take mere minutes. We’re talking orders of magnitude of performance – for most development remote is perfectly fine, but for heavy lifting you can’t beat a local pipe.

Now you can run your query… If you’re running on a local pipe and you’re running on screen, you should be able to sever your screen connection without terminating your super long query. Let’s hope that query doesn’t table lock everything to kingdom come!