Install Scala on Intellij 15

In order to be able to work with Scala on Intellij, you will need to install the Scala plugin from the Jetbrains plugin market.

From the open project view, you’ll want to head over to the preferences screen from the Intellij Menu:

Screen Shot 2016-01-04 at 11.22.35 AM.png

And then head over to the plugins section:

Screen Shot 2016-01-04 at 11.02.05 AM.png

From here you’ll want to click on “Install Jetbrains plugin…”

Screen Shot 2016-01-04 at 11.01.38 AM.png

And then search for Scala. Once you’ve located the plugin, click the install button and then restart Intellij. Once installed you will need to then create a new project to see the newly available Scala options:

Screen Shot 2016-01-04 at 11.13.08 AM.png

Pick Scala from the options and walk through the rest of the setup and you will then have a new Scala project in Intellij.

How to configure Geb/Spock with Gradle

Geb/Spock + Gradle

Well, it turns out you have to use the right version of geb-core, geb-spock and spock-core, not to mention the right version of groovy. The problem appears to be that that Geb/Spock integration jar (geb-spock:0.7.2) was built with using a slightly Groovy 1.8, and the 2.x series just hasn’t caught up yet. This means trying to get Geb/Spock on Gradle 2.x working just won’t work – you will get ClassNotFoundExceptions and an urge to pull your hair out. After digging around and trying various combinations I finally settled for Spock 0.6 on Gradle 1.8, Geb/Spock 0.7.2, and Geb 0.7.2. Note that the Geb/Spock integrations should run the same version. My gradle dependencies wound up looking like this:

dependencies {

	def seleniumVersion = "2.42.2"
	def phantomJsVersion = '1.1.0'
	def cargoVersion = '1.4.9'

	// selenium drivers
	compile "org.seleniumhq.selenium:selenium-ie-driver:$seleniumVersion"
	compile "org.seleniumhq.selenium:selenium-chrome-driver:$seleniumVersion"
	compile "org.seleniumhq.selenium:selenium-firefox-driver:$seleniumVersion"
	compile "org.seleniumhq.selenium:selenium-support:$seleniumVersion"
	compile("com.github.detro.ghostdriver:phantomjsdriver:$phantomJsVersion") {
		transitive = false
	}

	// geb
	compile 'org.codehaus.geb:geb-core:0.7.2'
	compile 'org.codehaus.geb:geb-spock:0.7.2'

	// spock
	compile 'org.spockframework:spock-core:0.6-groovy-1.8'

	compile 'junit:junit:4.8.2'
	compile 'org.slf4j:slf4j-log4j12:1.7.6@jar'
	compile 'org.slf4j:slf4j-api:1.7.6@jar'

}

I wanted to create a separate task just to run these Geb/Spock tests so did the following:

task acceptanceTest(type: Test, dependsOn: [compileTestGroovy]) {

	maxParallelForks = 5
	forkEvery = 5

	include 'com/something/acceptance/**'

	doFirst {
		println 'nStarting tomcat via cargo'
		tasks.cargoStartLocal.execute()
	}

	doLast {
		println 'nStopping tomcat via cargo'
		tasks.cargoStopLocal.execute()
	}

	def timestamp

	beforeTest { descriptor ->
		timestamp = new Date()
	}

	afterTest { desc, result ->
		logger.lifecycle("nn>>> Running " + "${desc.name} [${desc.className}]")
		println "Executed ${desc.name} [${desc.className}] with result: " +
			"${result.resultType} in ${new Date().getTime() - timestamp.getTime()}ms"
	}

}

Since my Geb tests are written in groovy, I’ve structured my project such that my acceptance tests are in the proper groovy directory, and now I can now run Geb tests just like regular Unit and Integration tests. Heck, I could even bundle cargo with it and have it run my application, fire up the Geb/Spock Tests and then shut down the app in one fell swoop. Final script looks like this:

buildscript {
	repositories {
		jcenter()
	}
	dependencies {
		classpath 'com.bmuschko:gradle-cargo-plugin:2.0.3'
	}
}

apply plugin: 'java'
apply plugin: 'groovy'
apply plugin: 'com.bmuschko.cargo'

repositories {
	jcenter()
	mavenCentral()
}

dependencies {

	def seleniumVersion = "2.42.2"
	def phantomJsVersion = '1.1.0'
	def cargoVersion = '1.4.9'

	// selenium drivers
	compile "org.seleniumhq.selenium:selenium-ie-driver:$seleniumVersion"
	compile "org.seleniumhq.selenium:selenium-chrome-driver:$seleniumVersion"
	compile "org.seleniumhq.selenium:selenium-firefox-driver:$seleniumVersion"
	compile "org.seleniumhq.selenium:selenium-support:$seleniumVersion"
	compile("com.github.detro.ghostdriver:phantomjsdriver:$phantomJsVersion") {
		transitive = false
	}

	// geb
	compile 'org.codehaus.geb:geb-core:0.7.2'
	compile 'org.codehaus.geb:geb-spock:0.7.2'

	// spock
	compile 'org.spockframework:spock-core:0.6-groovy-1.8'

	// cargo support
	cargo "org.codehaus.cargo:cargo-core-uberjar:$cargoVersion",
		"org.codehaus.cargo:cargo-ant:$cargoVersion"

	compile 'junit:junit:4.8.2'
	compile 'org.slf4j:slf4j-log4j12:1.7.6@jar'
	compile 'org.slf4j:slf4j-api:1.7.6@jar'

}

// == test configurations == //

task acceptanceTest(type: Test, dependsOn: [compileTestGroovy]) {

	maxParallelForks = 5
	forkEvery = 5

	include 'com/something/acceptance/**'

	doFirst {
		println 'nStarting tomcat via cargo'
		tasks.cargoStartLocal.execute()
	}

	doLast {
		println 'nStopping tomcat via cargo'
		tasks.cargoStopLocal.execute()
	}

	def timestamp

	beforeTest { descriptor ->
		timestamp = new Date()
	}

	afterTest { desc, result ->
		logger.lifecycle("nn>>> Running " + "${desc.name} [${desc.className}]")
		println "Executed ${desc.name} [${desc.className}] with result: " +
			"${result.resultType} in ${new Date().getTime() - timestamp.getTime()}ms"
	}

}

// == cargo configuration == //

cargo {
	containerId = 'tomcat7x'
	port = 8080

	deployable {
		file = file("target/path/to/application.war")
		context = "/"
	}

	local {
		installer {
			installUrl = 'http://archive.apache.org/dist/tomcat/tomcat-7/v7.0.54/bin/apache-tomcat-7.0.54.zip'
			downloadDir = file("tomcat/download")
			extractDir = file("tomcat/extract")
		}
	}
}

As you can see, before the Geb tests run, I invoke the cargoStartLocal task to fire up tomcat7, and I’ve configured cargo such that it will download tomcat7 from apache, extract the archive, and then deploy my war file on port 8080. Once the Geb tests complete, cargo will shut down the app, and my automated acceptance tests will be complete.

Happy testing!

Subverting foreign key constraints in postgres… or mysql

Temporarily disable key constraints?

On postgres (version 8.1, mind you) I ran across a scenario where I was had to update a set of records that carried foreign key constraints with other tables. I was tasked with updating this table, and the new data may could end up in a state with broken key constraints. The normal postgres replace function would not work as there was no natural regex replace that I could run that would affect all the entries the way I wanted without breaking FK constraints. Ultimately I had to break down my queries in such a way that that at the end of the transaction, the constraints would check out. It turns out that in postgres when you define a foreign key, you can flag it as DEFERRED:

ALTER TABLE tb_other ADD CONSTRAINT tb_other_to_table_fkey 
	FOREIGN KEY (tb_table_pk) REFERENCES tb_table (tb_table_pk) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE;

With the alter table command above we can then make use of this DEFERRABLE clause -this flag tells postgres that this constraint check may be deferred until the end of the transaction. The INITIALLY IMMEDIATE clause tells postgres the default constraint behavior is to check the constraint immediately, when the transaction attempts to perform the corresponding delete or insert. You can also flag the constraint to be INITIALLY DEFERRED. Initially deferring as you might guess tells postgres to check the constraint at the end of the transaction. I think generally if you want constraints though, you will probably want to check immediately. It’s good to know you have the option if you really need it though.

Once the foreign key constraint is set as deferrable, we can then execute a script like this to defer the constraint checks until the end of the transaction:

-- postgres deferred constraints in action
begin;

SET CONSTRAINTS ALL DEFERRED;

delete from tb_table;

insert into tb_table values ( nextval(sq_table), value1, value2, value3);
insert into tb_table values ( nextval(sq_table), value1, value2, value3);
insert into tb_table values ( nextval(sq_table), value1, value2, value3);

commit;

Pretty useful in my opinion. I think I prefer this solution as opposed to disabling triggers across the table since disabling triggers is a schema change and you end up being responsible for restoring them once you’re done. Consider the following :

-- postgres disabled triggers
begin;

ALTER TABLE tb_site DISABLE TRIGGER ALL;

delete from tb_table;

insert into tb_table values ( nextval(sq_table), value1, value2, value3);
insert into tb_table values ( nextval(sq_table), value1, value2, value3);
insert into tb_table values ( nextval(sq_table), value1, value2, value3);

-- make sure to restore the triggers
ALTER TABLE tb_site ENABLE TRIGGER ALL;

commit;

In this implementation you end up altering the schema to disable all the triggers associated with this table. Don’t forget to re-enable the triggers at the end of the transaction of the disabling will remain in place. Another thing to consider is if you have auditing type of triggers on your target table, you will then end up having to manually fire those triggers or run the appropriate clauses to perserve the original trigger’s integrity. This kind of thing could quickly turn into quite the problem if not handled correctly.

Mysql’d keys

The mysql approach handles this case very similar to the disabled triggers – instead, it uses a system variable called FOREIGN_KEY_CHECKS that can be toggled on or off:

 

-- mysql key constraint supression
begin;

-- lift 
SET FOREIGN_KEY_CHECKS=0;

delete from tb_table;

insert into tb_table values ( nextval(sq_table), value1, value2, value3);
insert into tb_table values ( nextval(sq_table), value1, value2, value3);
insert into tb_table values ( nextval(sq_table), value1, value2, value3);

-- put back when you're done
SET FOREIGN_KEY_CHECKS=1;

commit;

As you can see it’s a very similar approach to the trigger disable in postgres. From the documentation at the time of this writing (mysql version 5.5 – Deferred Foreign Keys in MySql) it looks like deferred keys are just not an option in mysql even though it’s listed as a standard. Worthy of notice.

References:
Postgres Set Constraints
Postgres Create Table documentation

Run a huge query as fast and safely as possible

Use this as a last resort

Queries that take a long time are generally a bad thing. If your application requires these kinds of measures to perform its duties, then chances are you really need to revise your table structures and/or you queries – ideally these queries should take seconds at the most, while data warehouse type reporting queries should be on the order of minutes. That said, sometimes you may need to update your entire schema, delete columns on a table with millions of records, or run a stored proc that goes in and cleans up data across several sets of tables across untold numbers of rows. If you try to run it from putty, or any other remote terminal and anything happens that might sever your connection, you might end up SOL with a rolled back exception that would leave you exactly where you started – with no data updated. These are some strategies you can use to mitigate the risk and cut down on the query time.

Try different strategies

Consider running a proc that pulls one million records, and then updates each record individually – you might want to get some popcorn since that update might take a while. That kind of update is a linear approach and generally bad because it will need to sequentially go through each record one at a time. Divide and conquer might work better – you could try batch updates across segments of the table where indexes are used – something like:

update table set column = value where constraint = 'arbitrary value';
update table set column = otherValue where constraint = 'some other value';

Another approach could be to reconstruct the table using the data from your target table, while filtering out or substituting in hardcoded values for the data you want to replace:

insert into clone_table
select primary_key, column, now() as activated_date, 
	other_column, true as is_active
from table 
where status = 'active'

You could use this approach to reconstruct your table with the data you want and then swap table references on the foreign keys. That part might get a little tricky but if you do it right, using select insert could end up saving you quite a bit of time – select inserts could take minutes while updates could take orders of magnitude longer.

Use screen to wrap your remote session

If your database is running on unix, without a doubt you’ll want to use screen if you need to run a very long query. If your database is on linux, I’m not sure there’s an equivalent. Anyone that’s used putty or some other terminal type of remote console app knows what it’s like to have some long running process terminate prematurely because the connection was severed, or your computer crashed. Screen saves you from those infrequent occurrences by creating an emulated session that can be detached/re-attached such that if you do get disconnected, you can go back and pick up where you left off. It’s very handy for executing a long running process where a disconnect would either cancel the proc and normally terminate the session.

To invoke screen, just type the word screen into the command prompt:

[root@bedrock ~]# screen

This will start your screen session. This may or may not provide some relevant information at the bottom of the screen like in the example below depending on your flavor of unix or configuration:

[root@bedrock ~]#

[ bedrock ][ (0*bash) ][2011-09-09 21:57 ]

Now that screen is up, you can disconnect your terminal app without fear that your screen session would terminate prematurely. You can then log back into the unix box and get a listing of all the current screen sessions with the following command:

[root@bedrock ~]# screen -ls
There are screens on:
     27470.pts-0.bedrock (Attached)
     8177.pts-0.bedrock (Detached)
     mySessionName (Detached)
3 Sockets in /var/run/screen/S-agonzalez.

I should point out that the session name is organized like [processId.sessionName]. You can name your session upon creation with the following command:

[root@bedrock ~]# screen -S yourSessionName

Once you’ve found the right screen session (they’re listed by session name) you can re-attach your severed session with the following command:

[root@bedrock ~]# screen -r mySessionName
There are screens on:
27470.pts-0.bedrock (Attached)
8177.pts-0.bedrock (Detached)
2 Sockets in /var/run/screen/S-agonzalez.

Once you’re in screen it’s useful to know a few keyboard commands to get around:

Control+c, pause then Control+d Detaches your session without termination
Control+c, then h screen capture, and save to your home directory as hardcopy.x (x being the number)
Control+c, then C (capital c) clear the screen of text
Control+c, then N (capital n) display information about the current screen window
Control+c, then ? help screen!

You can find more command, options and details at this screen manpage.

Run your query through a local pipe

If your query pulls back a lot of data, its going to require bandwidth to pipe it all back to your remote client. Don’t use remote clients (lik pgAdmin, MySQL workbench, SQuirreL etc) unless you’re running them directly on the box that’s running your database. Connect remotely and log in through a local pipe however you’re supposed to connect to the local command line:

[root@bedrock ~]# psql -l username my_pg_database
Welcome to psql 8.1.21, the PostgreSQL interactive terminal.

Type: copyright for distribution terms
h for help with SQL commands
? for help with psql commands
g or terminate with semicolon to execute query
q to quit

ml_publisher=#

You would be amazed how much faster a query runs when you’re running it directly on the machine. To give you an idea – running an update across 2 million rows might take an hour if you’re running from a remote client, while running it directly on the box might take mere minutes. We’re talking orders of magnitude of performance – for most development remote is perfectly fine, but for heavy lifting you can’t beat a local pipe.

Now you can run your query… If you’re running on a local pipe and you’re running on screen, you should be able to sever your screen connection without terminating your super long query. Let’s hope that query doesn’t table lock everything to kingdom come!

Configuring Data Sources, JBoss 7

Yep it’s gonna be a big year for JBoss AS 7

This will be the first in a series I’ll be writing on JBoss’ new application server version 7. Lately I’ve been playing around with JBoss AS 7 recently, and all I can say is.. !@#%, NICE! I downloaded 7.0 with the expectation that it would honor a lot of the previous version’s overall approach and layout. I was in for a BIG surprise. It comes off as a total rewrite, leveraging a lot of the latest and greatest technologies and frameworks – things like Weld (an implementation of the context and dependency injection spec – JSR-299), OSGi (the Open Services Gateway initiative framework for the uninitiated), Hibernate, and RESTeasy.

I’ll say the guys over at JBoss certainly delivered. Before, server start up times could take a respectable 30 seconds to a minute or more depending on your deployment structure and dependencies. Now you ask? Less time than my 15 second ant build script! Right now I’m clocking 14 second from cold to deployed on my smaller sized application. With AS 5, the same deployment was taking something like a minute. Hat’s off guys, you all at JBoss really did some work!

The first thing and arguable the most difficult thing you’ll want to do is set up the data source for your deployment.

Configuring the Data Source

Before we had to configure out postgres-ds.xml file with all the data source metadata required to configure out application. The process now isn’t as straight forward – there are three ways to do it, two if you don’t count using the really nice console manager it ships with. I should mention that now there are 2 types of configuration setups 1) domain and 2) standalone. Standalone is the model we’re most familiar with – a single instance acting as a single server. Domain on the other hand is geared for a clustered style of deployment – although its way more flexible than that. More on this in another article. For the sake of simplicity, lets start with the standalone type.

Place the jdbc driver

There are 2 ways to do this. The first is really straight forwards – just stick your jdbc jar file in the deployment folder indicated in the configuration file:

jboss-7.0.0.GAstandaloneconfigurationstandalone.xml

Relevant contents:

<subsystem xmlns="urn:jboss:domain:deployment-scanner:1.0">

	<deployment-scanner name="default" 
		scan-enabled="true" scan-interval="5000" 
		deployment-timeout="60"
		relative-to="jboss.server.base.dir" 
		path="deployments" />

</subsystem>

Stick your jdbc jar file in here, and JBoss will automatically configure your standalone.xml file for you. BTW, this deployment-scanner entry maps the location of the deployments directory:

jboss-7.0.0.GAstandalonedeployments

Where jboss.server.base.dir points to the “standalone” directory and path maps the name of the deploy folder “deployments”.

The second way is a more complex and so requires a little bit more legwork. JBoss has completely changed its class loading strategy, and if you’ve ever worked with Maven repositories it might feel very familiar. Essentially jboss’ modules folder is where all the jars that are used by the jboss server live. By separating them into a separate classpath, you won’t run into weird classpath errors when there are competing jar files/versions deployed by your application. This problem exposed itself in earlier versions of jboss – in particular with the xml jars. If you had a mixed case of xml libraries, jboss might have been using an older version that could override your application’s newer version – hard to track down if you don’t know where to look. Anyway, these jar files are organized by psuedo packages – just like maven repositories except the final folder is called main. Each module jar file must be placed there and be paired with corresponding a module.xml file. For example you’d want to create a folder in your install like this:

jboss-7.0.0.GAmodulesorgpostgresqlmain

Here is an example of module.xml:

<?xml version="1.0" encoding="UTF-8"?>
<module xmlns="urn:jboss:module:1.0" name="org.postgresql">
  <resources>
    <resource-root path="postgresql-9.0-801.jdbc4.jar"/>
  </resources>
  <dependencies>
    <module name="javax.api"/>
    <module name="javax.transaction.api"/>
  </dependencies>
</module>

You’ll want to map the name of the jdbc driver, as well as the name of the module name here – we’re going to map it to the configuration next. Once this is squared away, we’ll want to configure the standalone.xml file:

jboss-7.0.0.GAstandaloneconfiguration

Map and Configure

In standalone.xml, you’ll want to look for the <subsystem xmlns=”urn:jboss:domain:datasources:1.0″> node and add a shiny new configuration like this:

<subsystem xmlns="urn:jboss:domain:datasources:1.0">
	<datasources>
			<datasource jndi-name="java:jboss/DefaultDS" enabled="true" 
				jta="true" use-java-context="true" use-ccm="true"
				pool-name="postgresDS" >
			<connection-url>
				jdbc:postgresql://localhost:5432/database?charSet=UTF-8
			</connection-url>
			<driver>
				org.postgresql
			</driver>
			<transaction-isolation>
				TRANSACTION_READ_COMMITTED
			</transaction-isolation>
			<pool>
				<min-pool-size>
					10
				</min-pool-size>
				<max-pool-size>
					100
				</max-pool-size>
				<prefill>
					true
				</prefill>
				<use-strict-min>
					false
				</use-strict-min>
				<flush-strategy>
					FailingConnectionOnly
				</flush-strategy>
			</pool>
			<security>
				<user-name>
					username
				</user-name>
				<password>
					password
				</password>
				</security>
			<statement>
				<prepared-statement-cache-size>
					32
				</prepared-statement-cache-size>
			</statement>
		</datasource>
		<drivers>
			<driver name="org.postgresql" module="org.postgresql">
				<datasource-class>
					org.postgresql.Driver
				</datasource-class>
			</driver>
		</drivers>
	</datasources>
</subsystem>

Pay attention to:

	<driver>
		org.postgresql
	</driver>

Note: you can set this to the jdbc driver file name if you’re using the deploy approach. In fact, jboss will be more than happy to write the driver configuration for you if you deploy the driver from the deploy directory.

This entry maps to the driver configured directly below to the driver name configured by :

	<driver name="org.postgresql" module="org.postgresql">
		<xa-datasource-class>
			org.postgresql.Driver
		</xa-datasource-class>
	</driver>

The name property maps the driver to the configuration, and the module property maps to the module we laid out in the first step. I’ll point out that it seems that you need to use a transaction aware data source. I think you’re supposed to be able to use the node </datasource-class> with the regular driver class but when I tried this, I got xml parsing errors – it doesn’t seem to think “datasource-class” is a legal element.

You can call on the data source file through the jndi handle configured on the datasource node: jndi-name=”java:jboss/DefaultDS”. The rest of the properties and nodes configure various settings for your datasource, and if you’ve worked with them before you will probably be familiar with them already. If you need a refresher (like me) you can also look through all the JBoss user guide documentation.

References:
JBoss Wiki on Datasource configuration
JBoss user guide documentation
JBoss Wiki Getting Started Guide
JBoss Getting Started Admin Guide

Manually override and launch quartz jobs…

Override quartz settings?

So you have a quartz job that’s chugging along nicely until you’re hit with the reality that the job details parameters change or the job needs to be suspended, or something happens that you end up having to recompile and redeploy your application just to update the packaged quartz job properties. This is no fun. You will undoubtedly have to take the updated code through the regular qa cycle, regress test, and then ultimately redeploy your code into the production environment. Surely there must be some way to address this problem when using Jboss…

One way I came up with was to divorce the job execution code from the job invocation, while making sure that the JobDetailsMap always checked an external resource before defaulting to loading the packaged resource within the deployed artifact. To allow for manual invocation, I also added a servlet that basically just wrapped the decoupled job invocation code in order to launch the quartz job. I also added a property to the JobDetailMap – “enable” which I used as a flag for whether the job should fire or not. Because it would try to load an external resource before defaulting, we were then able to have complete control over the quartz job’s properties. Note that you can’t change the cron fire date by using this method – the job itself is loaded in from the minute your application fires up – to reload the job you’d have to programatically find the existing job, destroy it and then create a new one based off the external properties. In my particular case we didn’t need to go that far but that option is available for those that need it.

The steps:

1) stick a copy of the quartz-config.xml file in the jboss.conf.dir location: maybe something like “/jboss/server/myInstance/conf/quartz-config.xml”. This conf directory is explored in depth in the related post Jboss System Properties.

2) Rig your quartz Job class so the execute(JobExecutionContext jobContext) method simply calls a plain launchJob() method. By doing this you end up separating the call that launches the job from the quartz specific entry point so any externally invoking code can call your launchJob() method directly without you having to figure out how to populate a JobExecutionContext object to pass into that execute() method:

public class QuartzJob implements Job {   
   
    public void execute(JobExecutionContext jobContext)   
        throws JobExecutionException {   
   
         log.info("launching regularly scheduled quartz job");   
        launch(); 
    }   
 
    public void launch() { 

         // your job code would go here

    }

}

3) Read in that quartz-config.xml file from the Jboss conf directory if one exists, and extract the properties from the xml file to populate your own JobDetailsMap object. Default it to read in the quartz-config.xml packaged in your war, jar or ear file:

public void launch() { 

  Document document = null; 
  SAXReader reader = new SAXReader(); 
  JobDataMap map = new JobDataMap(); 

  try { 

       // this section here extracts properties from the config file	    
       InputStream is = null; 
       String quartzConfig = "quartz-config.xml"; 


       try { 

	    String path = System.getProperty("jboss.server.config.url")
		 +quartzConfig;   
	    URL url = new URL(path); 

	    log.info("attempting to load " + quartzConfig + " file from: " + path); 
	    is = url.openStream(); 
	    log.info("loaded " + quartzConfig + " from URL: " + path);   

       } catch (Exception e) { 

	    is = this.getClass().getResourceAsStream("/" + quartzConfig); 
	    log.info("couldn't load " + quartzConfig + 
		 " from URL, loaded packaged from war: /"+quartzConfig); 

       } 

       document = reader.read(is); 

       String xPath =
            "/quartz/job/job-detail[name = 'myQuartzJob']/job-data-map/entry"; 
       List<Node> nodes = document.selectNodes(xPath); 
       for (Node node : nodes) { 
	    String key = ((Node) node.selectNodes("key").get(0)).getText(); 
	    String value = ((Node) node.selectNodes("value").get(0)).getText(); 

	    map.put(key, value); 
       }

    } catch (Exception e) { 
         e.printStackTrace(); 
    } 

    String enabled = map.getString("enabled");
    if(enabled  != null && enabled .equalsIgnoreCase("true") ) { 
  
  	    // your job code here...

    }
}

You could also just as well have hardcoded the location of your quartz-config.xml file into a java.net.URL object – and then grabbed that inputstream for xpath extraction.

4) Wrap your quartz Job class in an external servlet:

public class MyQuartzServlet extends GenericServlet { 
 
     private static final long serialVersionUID = 1L; 
     private static final Log log = LogFactory.getLog(MyQuartzServlet .class); 
 
     @Override 
     public void service(ServletRequest req, ServletResponse res)
          throws ServletException, IOException { 
 
          log.info("launching quartz job" from servlet); 
          QuartzJob  importJob = new QuartzJob (); 
          importJob.launch(); 
           
          // forward to some jsp, and/or add other job success/fail logic here
          getServletConfig().getServletContext()
                    .getRequestDispatcher("/servlet.jsp").forward(req,res); 
           
     } 
      
}

Of course you’d have to configure the servlet and servlet-mappings in your application’s web.xml, but that should be pretty straight forward.

Congrats, you now have a quartz job that loads an external configuration file, and that can also be invoked manually through a servlet. I’m not saying this is perfect and should be used in every possible quartz scenario, but this approach works well for quartz jobs where the properties might need overriding or temporary disabling. I can also understand the argument for why you would want to necessarily cycle every configuration change through qa. I hope at least this gives some folks idea outside the proverbial box. Now on to bigger fish to fry…

Configure ssh authorized keys for cvs access

Continous Integration

Lately I’ve been working on adding Hudson as the continuous integration (CI) server for projects at work. The whole notion of CI merits an entire discussion, but suffice to say it’s a very clean, approach that helps automate the build process particularly if you run manual builds that use prompted shell scripts.

After looking at a few solutions, Hudson seemed from many accounts to be the easiest to get running, and pretty flexible when integrating into an existing build system. Add to the resume that it could run in a servlet container, divorcing environmental configuration from its automated build functionality, and we suddenly have a winner.

I went to work setting up integration build scripts and projects and all kinds or cool plugins when I finally hit a wall when it came time to wire up Hudson with cvs access. As it turns out, in our particular setup we access cvs via ssh, and ssh will usually require a password in order to connect to a remote host. When automating builds, this can be quite problematic since it seems that part of the argument is to allow the builds to fire off without interactive human intervention. I noticed that prompted passwords are very capable of raining that parade out.

I dug around for what seemed like forever, until it seemed that the solution was to enable authorized key access via ssh, and configure the generated public key to not require a pass phrase. In a nutshell, you can set up a public and private key, and configure it to require a pass phrase or not when requesting access. You then copy that public key to the remote machines you want to enable access to into the correct location. The last step is to configure authorized key access via ssh on the remote machine. Only then will you be able to ssh to the remote machine with the public key and without a password or pass phrase – in essance that public key becomes trusted authentication.

Here are the steps, with more detail:

Configure your connect-from machine

Let’s assume you’re going to use an account called builder for this example. In your shell as builder, cd into ~/.ssh and run:

ssh-keygen -f identity -C ‘buildier identiy cvs key’ -N ” -t rsa -q

This will create the set of keys for you without a pass phrase. The -C flag sets the comment tagged at the end of the key. You want to end up with a file structure like this:

[builder@connectFrom.ssh]# ls -l iden*
-rw——- 1 jboss CodeDeploy 1675 Dec 5 09:54 identity
-rw-r–r– 1 jboss CodeDeploy 405 Dec 5 09:54 identity.pub

on your connect-from machine. You will need to chmod the user’s home and .ssh directories to permission 0700. It turns out that these folder permissions are very picky and these keys will not work if the group or others have read/write access to that .ssh directory or its contents.

Configure your connect-to machine

You will now want to again create a ~/.ssh directory, also with permissions set to 0700 on the connect-to machine. Then use your favorite text editor to create the file: ~/.ssh/authorization_keys. This one’s even more strict – ensure that ~/.ssh/authorized_keys permissions is set to 0600. Paste the contents of your connect-from machine’s file ~/.ssh/identity.pub into this authorized_keys file contents. This step essentially copies the public key over as an authorized key to the remote machine. The file authorized_keys should have only one key per line, or it will cause problems. Lastly, we’ll need to make sure that the flag PubkeyAuthentication is enabled on the connect-to machine and that it it reads in the correct authorized_keys file.

Edit the file /etc/ssh/sshd_config file and uncomment the following:


PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys

Test
Now you should be able to test the ssh connection with debugging enabled by saying form the connect-from machine’s shell:

[builder@connectFrom.ssh]ssh -v builder@connectTo

You should see connection information useful for debugging – looking for something like this:

debug1: Next authentication method: publickey
debug1: Offering public key: /home/builder/.ssh/identity
debug1: Server accepts key: pkalg ssh-rsa blen 277
debug1: read PEM private key done: type RSA
debug1: Authentication succeeded (publickey).
debug1: channel 0: new [client-session]
debug1: Entering interactive session.
debug1: Sending environment.
debug1: Sending env LANG = en_US.UTF-8
Last login: Thu Dec 2 01:17:40 2010 from connectFrom
[builder@connectTo~]$

Configure Hudson to use the external ssh
Now that these authorized keys have been configured for use, you can go into Hudson and set up the cvs connection string. You will need to make sure that the cvs advanced configuration is set to :

$CVS_RSH: ssh

And you should be all set.

Your builder account should now be able to access the remote machine using the trusted authorized keys.

Resources:
How to allow SSH host keys on Linux (Fedora 10 & CentOS 5.2)
ssh – authorized_keys HOWTO
2.4.1 Connecting with rsh and ssh

The conf directory, JBoss v5.x

Configure what you need

So you’re ready to configure some JBoss files? Great, lets have a look at the conf directory. The jboss/server/<configured instance>/conf directory is where you’ll find the majority of the configuration files for your jboss instance. For most deployments, the majority of these folders and files will remain untouched as they default to usable configurations. In this article we’ll go over the more practical configurable files, while leaving the really low level configurations alone.

`-- conf
    |-- bootstrap/
    |--|-- bindingservice.bean
    |   `-- META-INF
    |   |    `-- bindings-jboss-beans.xml *
    |   |-- aop.xml
    |   |-- classloader.xml
    |   |-- deployers.xml
    |   |-- jmx.xml
    |   |-- logging.xml
    |   |-- profile.xml *
    |   `-- vfs.xml
    |-- props
    |   |-- jbossws-roles.properties
    |   |-- jbossws-users.properties
    |   |-- jmx-console-roles.properties
    |   `-- jmx-console-users.properties
    |-- xmdesc
    |   |-- AttributePersistenceService-xmbean.xml
    |   |-- ClientUserTransaction-xmbean.xml
    |   |-- JNDIView-xmbean.xml
    |   |-- Log4jService-xmbean.xml
    |   |-- NamingBean-xmbean.xml
    |   |-- NamingProviderURLWriter-xmbean.xml
    |   `-- NamingService-xmbean.xml
    |-- bootstrap.xml
    |-- jacorb.properties
    |-- java.policy
    |-- jax-ws-catalog.xml
    |-- jboss-log4j.xml
    |-- jboss-service.xml
    |-- jbossjta-properties.xml
    |-- jndi.properties
    |-- login-config.xml
    |-- standardjboss.xml
    `-- standardjbosscmp-jdbc.xml

For starters, the bootstrap.xml file lets you configure which microcontainer deployments are loaded on boot. The ones we want to pay special attention to are profile.xml and conf/bindingservice.beans’s bindings-jboss-beans.xml (marked with an asterisk in the layout diagram) The other files are geared for low level configurations. For example – aop.xml configured how aop is implemented in jboss, deployers.xml configures jboss’ classloaders, and vfs.xml tells jboss what lib folders to load jars from.

External deployments

If you open up profile.xml, you’ll see a set of configurations that describe what kind of deployers are used for the profile object management service. Most of the contents deal with Jboss’ innards, but worthy of reviewing is the “applicationURIs” property. It’s actually a list of java.net.URI urls, and you can add elements to this list in order to configure Jboss so that it looks in to external deployment directories in addition to the defauls deploy directory. More detail is described in the related article “External deploy directories in JBoss 5.1”.

Port Bindings

Before Jboss 5, you would have to manually go in and change each port into its own numberspace. There were a bunch of places where these ports would need to be updated (for things like the naming and rmi services) across a slew of files scattered all over the place. In Jboss 5, if you open up bindings-jboss-beans.xml, you’ll find a means of binding multiple jboss instances across different ports in a single centralized location. Out of the box, Jboss ships with 4 sets of port configurations. Each configuration set reserves a set of ports for JBoss’ use. If you trail through each of these port configurations, you’ll notice the sets are offset by increments of 100. So for example the first configuration reserves 1099 for the NamingServer, while the second set reserves 1199, and the third set uses 1299 and so on. This functionality is particularly useful when you want Jboss to run on a single IP address. Given a choice though, I would opt for using a single IP per Jboss instance. It’s nice to have this option though, in case multiple IPs is not an option.

We can get an idea of how JBoss configures these binding sets from the snippets below – both are straight out of the bindings-jboss-beans.xml file:

<!-- Provides management tools -->
<bean name="ServiceBindingManagementObject" 
  class="org.jboss.services.binding.managed.ServiceBindingManagementObject">

  <constructor>
     <!-- The name of the set of bindings to use for this server -->
     <parameter>${jboss.service.binding.set:ports-default}</parameter>

     <!--  The binding sets -->
     <parameter>
	 <set>
	    <inject bean="PortsDefaultBindings"/>
	    <inject bean="Ports01Bindings"/>
	    <inject bean="Ports02Bindings"/>
	    <inject bean="Ports03Bindings"/>
	 </set>
     </parameter>

     <!-- Base binding metadata used to create bindings for each set -->
     <parameter><inject bean="StandardBindings"/></parameter>

  </constructor>
</bean>

...

<!-- bindings are obtained by taking the base bindings and adding offsets  -->
<bean name="Ports01Bindings"  
	class="org.jboss.services.binding.impl.ServiceBindingSet">
	<constructor>
	<!--  The name of the set -->
		<parameter>ports-01</parameter>
	<!-- Default host name -->
		<parameter>${jboss.bind.address}</parameter>
	<!-- The port offset -->
		<parameter>100</parameter>
	<!-- Bindings to which the "offset by X" approach can't be applied -->
		<parameter><null/></parameter>
	</constructor>
</bean>

If you run Jboss without a ports configuration, it will use the default port settings. If you want to use a specific port configuration, all you need to do is add the startup param to the run.sh script used to invoke jboss: “-Djboss.service.binding.set=ports-01”. This param selects which port config to use, and likewise can be used to use any of the available port bindings found in bindings-jboss-beans.xml.

The props folder – default UsersRolesLoginModule properties for jmx-console security

If you make use of jmx-console’s default UsersRolesLoginModule JAAS security domain configuration, you’ll find that the user and role properties are stored in this “props” folder. More on securing the jmx console can be found here. You can also opt to use a different security module, by defining a different login module in login-config.xml. More on this in a few paragraphs.

The xmdesc folder – mbean descriptors

This folder contains all the descriptors used for the major mbeans described in the jboss-service.xml. Not all the possible mbeans have been converted over to the new bootstrapping style format, so these legacy descriptors close the gap to allow existing mbean services to continue to work.

Logging with jboss-log4j.xml

This piece allows you to configure the logging for the entire instance. If you’re familiar with log4j, you’ll know right off the bat how to configure the logging. Out of the box Jboss ships with console output enabled. For production level environments though, we’ll want that disabled as console logging always takes up unnecessary system resources. You are able to control smtp, console, file, jms, and file based logging with levels (DEBUG, INFO, WARN, ERROR, FATAL etc) and by category (com.yourpackage.class or com.yourpackage). Asynchronous appenders allow you to configure logging that gets piped to more than one target. Learn mroe about how to configure this from Apache’s Log4J’s project website. A basic log4j configuration can be found in the example below:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/" 
      debug="false">
   <!-- A time/date based rolling appender -->
      <appender class="org.jboss.logging.appender.DailyRollingFileAppender"
        name="FILE">
      <errorHandler class="org.jboss.logging.util.OnlyOnceErrorHandler"/>
      <param name="File" value="${jboss.server.log.dir}/server.log"/>
      <param name="Append" value="true"/>
       
      <!-- Rollover at midnight each day -->
      <param name="DatePattern" value="'.'yyyy-MM-dd"/>

      <layout class="org.apache.log4j.PatternLayout">
         <!-- Default pattern: Date Priority [Category] (Thread) Messagen -->
         <param name="ConversionPattern" value="%d %-5p [%c] (%t) %m%n"/>
      </layout>
   </appender>

   <root>
	<priority value="${jboss.server.log.threshold}"/>
	<appender-ref ref="FILE"/>
   </root>
</log>

JAAS/Security Domains with login-config.xml

This file allows you to configure security domain with JAAS for your applications. You have a number of login modules to choose from, notably the DatabaseServerLoginModule. This module allows you to hook in a set of users and roles query into a datasource to validate authentication credentials and allow access to your application. In a default install, Jboss will have example configurations where users and roles are also stored in the props directory and accessed via the UsersRolesLoginModule.

An example JAAS database login module configuration in login-config.xml:

<?xml version="1.0" encoding="UTF-8"?>
<deployment xmlns="urn:jboss:bean-deployer:2.0">
  <application-policy xmlns="urn:jboss:security-beans:1.0" name="loginForm">
    <authentication>
      <login-module code="org.jboss.security.auth.spi.DatabaseServerLoginModule"
        flag="required">
        <!--  BASE64 also possible -->
        <module-option name="hashAlgorithm">MD5</module-option> 
        <module-option name="unauthenticatedIdentity">guest</module-option>
        <module-option name="dsJndiName">java:/DefaultDS</module-option>
        <module-option name="principalsQuery">
        	SELECT password FROM User WHERE username=?
        </module-option>
        <module-option name="rolesQuery">
        	SELECT role, 'Roles' 
        	FROM UserRoles, User 
        	WHERE User.username=? 
        		AND User.id = UserRoles.user_id
        </module-option>
      </login-module>
    </authentication>
  </application-policy>
</deployment>

In order for this configuration to work we’ll need to also add security domains to our web application’s web.xml as well as a the Jboss only jboss-web.xml deployment descriptor. More on how to configure this can be fond in Jboss’ server manual, section 9.6.4.

The remaining files

The last remaining files cover various integration configurations, and delve into Jboss’ low level implementation. One example is the java.policy file which configures your instance’s security policy on the jvm level. You may constrain permissions to allow strictly reading across the board or some custom mix of read and write, and by class. More information can be found on Jboss Web’s security manager page. For the uninitiated, Jboss Web is the integrated version of Tomcat 6 that deploys with JBoss as a deployable service.

The jbossjta-properties.xml file configures the Jboss transaction server’s default behavior. The jax-ws-catalog.xml maps xml metadata for use with jax-ws webservices, as specified by the Oasis XML Catalog spec. The jndi.properties file maps classes used by Jboss’ naming server. The jacorb.properties file configures JacORB (Java implementation of OMG’s Corba standard) for use with Jboss.

Finally, the standardjbosscmp.xml config defines how various types of enterprise java beans are configured during regular use, while standardjbosscmp-jdbc.xml configures specific persistence dialects relating to type mappings so that datasource files can communicate correctly with different database vendors. In your datasource file you can add a type element to specify which of these defined types jboss should use:

<?xml version="1.0" encoding="UTF-8"?>  
<datasources>   
     <local-tx-datasource> 
        <jndi-name>DefaultDS</jndi-name> 
        <connection-url>jdbc:postgresql://localhost:5432/db</connection-url> 
        <driver-class>org.postgresql.Driver</driver-class> 
         <user-name>user</user-name>   
         <password>password</password> 
          <metadata> 
               <type-mapping>PostgreSQL</type-mapping> 
          </metadata>          
     </local-tx-datasource> 
</datasources>

The recap

Once you’ve gone over these configuration files, you’ll have seen with some depth part of what Jboss is capable of. As mentioned earlier, you wont want to muck around with most of these configurations unless you’re working on Jboss server code itself or unless you want to tweak the server’s behavior. The most commonly edited files end up being bindings-jboss-beans.xml (multiple port bindings), profile.xml (external deploy directories), jboss-log4j.xml (logging), login-config.xml (security domains) and java.policy (jvm level security permissions). With these basics you can be well off into configuring Jboss for whatever needs you need to fill.

Resources:
Securing the JMX Console
JBoss App server Quick Tour
JBoss Web manual
Oasis XML Catalog project

Apache XSL-FO’ sho v1.0

Transforming XML into PDFs.. and stuff

If you’ve ever been tasked with providing PDF documents via xsl, you’ve surely done some homework and shopped around for viable third party libraries. Some are good, some are great and rightly so charge a price, and some are just flat out incomplete or shanty in their documentation. It’s not a knock on anyone, its just a fact well known to open source developers. Historically what has been missing is an open standard for pdf generation, and possibly other output formats.

Enter XSL-FO: XSL Formatting Objects is an open standard for formatting documents in order to produce media artifacts such as PDF, postscript (PS), rich text format (RTF), and png files. Because it’s XML centric, you can marry your XML data to an XSL-FO stylesheet and perform a translation that will output a file in any of these format or others. XSL-FO is simply the XSL dialect used to lay out the document, and Apache FOP is the open source java based software you can use to process those transformations.

Apache FOP has been slowly making its complete debut over the past 3 years. Version 1.0 was finally released around the 12th of july, so its essentially a fresh release. Before that, .95 was the closest thing to production ready, but now that 1.0 is out, a more complete implementation awaits. There are still a few loose ends to tie up though, a complete rundown of FO compliance can be found on on apache’s XSL-FO compliance page

On with the examples:

The XML data

<block>
	<date>july 27th, 2010</date>
</block>

This is a very simple xml document, which we will be reading from in order to stamp the date onto a pdf document.

The XSL-FO layout

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet 
	version="1.0"
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
	xmlns:fo="http://www.w3.org/1999/XSL/Format">
	
	<xsl:template match="/">
			
	<fo:root font-family="Verdana" font-size="12pt" text-align="center"
	    xmlns:fo="http://www.w3.org/1999/XSL/Format"
	    xmlns:fox="http://xml.apache.org/fop/extensions">
	
	<fo:layout-master-set>
	  <fo:simple-page-master master-name="master">
		<fo:region-body margin="0in"
	  		background-image="http://my.images.com/banner.jpg"
			background-repeat="no-repeat"
			background-position="center"  />
	  </fo:simple-page-master>
	</fo:layout-master-set>
	
	<fo:page-sequence master-reference="master">

	  <fo:flow flow-name="xsl-region-body">
		  
		<fo:block 
          	margin-top="50px"
          	margin-left="200px">
			Today's XML date is: <xsl:value-of select="/block/date"/>
		</fo:block>
		  
	  </fo:flow>
	</fo:page-sequence>
	
	</fo:root>
	  
  </xsl:template>

</xsl:stylesheet>

This is the XSL-FO layout we’ll be using to stamp on the pdf. It’s marked up using regular XSL-FO. Covering the syntax of XSL-FO is beyond the scope of this article, but there are plenty of resources and tutorials online such as the W3Schools.com XSL-FO and Renderx.com tutorials.

On with the java

Finally, we come to the java code and apache’s fop usage:

	protected void export() throws  IOException {
	
	    //Setup a buffer to obtain the content length
		FileOutputStream out = new FileOutputStream("C:/image/layout.pdf");
		
		try {
		    
			// generic files to String XML and XSL
			String xml = FileUtils.readFile("C:/image/banner.text.xml");
			String xsl = FileUtils.readFile("C:/image/banner.layout.fo"); 
			
	        // configure fopFactory as desired
	        FopFactory fopFactory = FopFactory.newInstance();
	        TransformerFactory factory = TransformerFactory.newInstance();
	        
		    //Setup FOP
		    Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, out);
	
		    //Setup Transformer
		    Source xsltSrc = new StreamSource(new StringReader(xsl));
		    Transformer transformer = factory.newTransformer(xsltSrc);
	
		    //Setup input
		    Source src = new StreamSource(new StringReader(xml));
	
		    //Make sure the XSL transformation's result is piped through to FOP
		    Result res = new SAXResult(fop.getDefaultHandler());	    
		    
		    //Start the transformation and rendering process
		    transformer.transform(src, res);
		    
		} catch (Exception e) {	
			e.printStackTrace();
		} finally {
			out.close();
		}
	}

Pretty straight forward xslt looking code. But what if we want to override the FOP PDFgeneration defaults? What if we want to produce a document not regular PDF page size, like a banner or if we want to produce a png image? Luckily, FOP offers a factory configuration mechanism we can use to customize the outputs.

Rendering the output as a PNG file

The java code is pretty much the same thing, with some small differences. First you’ll want to invoke the fopFactory.setUserConfig(String pathToConfig) method on the FoPFactory object. This will flag apache FOP to load a custom configuration from the specified file. Secondly you’ll need to set the exporting mime type to MimeConstants.MIME_PNG, as show in the java code snippet below.

// configure fopFactory as desired
FopFactory fopFactory = FopFactory.newInstance();
fopFactory.setUserConfig(new File(rootPath + "export.conf.xml"));
TransformerFactory factory = TransformerFactory.newInstance();

//Setup FOP
Fop fop = fopFactory.newFop(MimeConstants.MIME_PNG, out);

Lastly, you’ll want to define your export.conf.xml file. The only thing that you’d be changing that strays from the defaults would be the exported object’s dimensions (set in the example below to 150px length by 900px wide) and adding the renderer element that defines an “image/png” type. This renderer block flags the processor to export as PNG. At the moment the only other image export format is TIFF, but between these two, most purposes are likely met. It’s worth mentioning that FOP supports export into Postscript, PCL, AFP, RTF, XML, and TXT to name a few. More details can be found on Apache FOP’s Output Target page. Here’s the source:

<?xml version="1.0"?>

<fop version="1.0">

	<!-- Base URL for resolving relative URLs -->
	<base>.</base>

	<!--
		Source resolution in dpi (dots/pixels per inch) for determining the
		size of pixels in SVG and bitmap images, default: 72dpi
	-->
	<source-resolution>72</source-resolution>
	<!--
		Target resolution in dpi (dots/pixels per inch) for specifying the
		target resolution for generated bitmaps, default: 72dpi
	-->
	<target-resolution>72</target-resolution>

	<!--
		Default page-height and page-width, in case value is specified as auto
	-->
	<default-page-settings height="150px" width="900px" />

	<!-- Uses renderer mime type for renderers -->
	<renderers>

		<renderer mime="image/png">
		  <transparent-page-background>false</transparent-page-background>
		  <fonts><!-- described elsewhere --></fonts>
		</renderer>

	</renderers>

</fop>

So if you want to export to a different format, all you’d need to do is use a custom configuration and set the renderer formats to match the one you’d like to use, as well as override any default document properties you wish.

By leveraging an open standard like XSL-FO you can use different vendors for your pdf generation code, and while Apache’s FOP implementation isn’t 100% complete in its support for XSL-FO, it does do a good job of supporting what most folks will need on a daily basis. It’s nice to see a complete version release after a long wait.

Resources:
Apache FOP website. v1.0 Finally released on 7/12/2010?, yay!
Apache FOP compliance guide
XSL-FO Object Model documentation
Renderx.com“>Renderx.com tutorial on XSL-FO

There’s also the ultimate XSL-FO list of resources:
Whoishostingthis.com xsl-fo Resources

Sardine powered webdav client?

Extra Sardines on my pizza please

A few days ago I came across the need for an easy to use webdav client. Currently we’re using jakarta slide, which as it turns out is a project that was discontinued (as of fall 2007!), and whose code base as of this writing is practically 10 years old. Who wants those jars collecting dust in their lib directories? Sure it works, but hey, I’m trying to keep up with the Jones’ here, I’d like an up-to-date library that hasn’t been discontinued.

Dismayed, I took a look a the replacement suggested by the jakarta site – the Jackrabbit project which is a java based content repository API implementation (JCR, as outlined in JSR 170 and 283). Uh.. I’m not really looking to integrate a full fledged content repository into my project just so I can access some files on a webdav server. If I was building a CMS though, I’d be way more interested. All I was looking for was an easy way to access files on a webdav server.

Next I found Apache’s commons-vfs project but I was disappointed to find this note regarding webdav: “.. We can’t release WebDAV as we depend on an snapshot, thus it is in our sandbox.” (full page here, skip to “Things from the sandbox” ). Dammit! Guess I’ll have to keep looking..

Finally, I stumbled across Google’s Sardine project, an oasis in a desert of mismatched suitors. I practically feel guilty about rehashing whats already well documented, but I am compelled if only to underscore the ease of use.

Classpath Dependacies

At the minimum you’ll need commons-logging.jar, commons-codec.jar, httpcore-4.0.1.jar and httpclient-4.0.1.jar if you’re on Java6+. If you’re on Java5 you’ll need JAXB 2.1 and any dependancies. Luckily for you, the authors have included links to the JAXB jars and have included the other jars in the Sardine distribution so you can easily add them to your classpath.

Code Examples

Using Sardine is really simple, and pretty self explanatory. You must first call SardineFactory.begin() to initiate the webdav session. If you don’t have authentication enabled, you don’t need to provide the username/password parameters.

public List<DavResource> listFiles() throws SardineException {

	log.debug("fetching webdav directory");

	Sardine sardine = SardineFactory.begin("username", "password");
	List<DavResource> resources = sardine.getResources("http://webdav/dir/");

	return resources;
}

This List of DavResource objects is essentially meta data about the webdav files, which you can then use to perform whatever tasks you need.

Grabbing the contents of a file is just as easy:

	public InputStream getFile(String fullURL) throws SardineException {

		log.info("fetching webdav file");

		Sardine sardine = SardineFactory.begin("username", "password");
		return sardine.getInputStream("http://webdav/dir/file.txt");
	}

as is saving a file:

	public void storeFile(String filePath) throws IOException {
		
		Sardine sardine = SardineFactory.begin("username", "password");
		byte[] data = FileUtils.readFileToByteArray(new File(filePath));
		sardine.put("http://webdav/dir/filename.jpg", data);
	}

checking if a file exists:

	public boolean fileExists(String filePath) throws IOException {
		
		Sardine sardine = SardineFactory.begin();
		if (sardine.exists("http://webdav/dir/filename.jpg")) {
			return true;
		}

		return false;
	}

Other code examples can be found for deleting, moving files from one place to another, copying files so you end up with two, and creating directories in the user guide in the Sardine project page.

Overall, Sardine is simple, elegant, easy to use and pretty darned sexy, so check it out. I guess it’s time to update all that jakarta API related code…