Keys in threes...

Monday, December 13, 2010

Diff-ing .zip files with FileMerge

Cross-posted on StackOverflow:

This isn't particularly elegant, but you can use the FileMerge application that comes with Mac OS X developer tools to compare the contents of zip files using a custom filter.

Create a script ~/bin/zip_filemerge_filter.bash with contents:

#!/bin/bash
##
#  List the size, CR-32 checksum, and file path of each file in a zip archive,
#  sorted in order by file path.
##
unzip -v -l "${1}" | cut -c 1-9,59-,49-57 | sort -k3
exit $?
Make the script executable (chmod +x ~/bin/zip_filemerge_filter.bash).

Open FileMerge, open the Preferences, and go to the "Filters" tab. Add an item to the list with: Extension:"zip", Filter:"~/bin/zip_filemerge_filter.bash $(FILE)", Display: Filtered, Apply*: No. (I've also added the filer for .jar and .war files.)

Then use FileMerge (or the command line "opendiff" wrapper) to compare two .zip files.

This won't let you diff the contents of files within the zip archives, but will let you quickly see which files appear within one only archive and which files exist in both but have different content (i.e. different size and/or checksum).

Monday, March 16, 2009

Logging JSch with Apache Commons Logging

I recently created a Java application at work which used JSch for copying files via SFTP. For the most part, JSch has worked very well for my needs. However, I ran into some problems with the way it handles timeouts, and wanted to debug JSch's behavior. I use Apache Commons Logging (usually with Log4j) in all my applications, and was a bit disappointed to discover that JSch doesn't use Commons Logging, but instead uses it's a custom com.jcraft.jsch.Logger interface. I ended writing my own simple Logger class which wraps around Apache Commons Log. This works, but has a couple of caveats due to the limitations of JSch's logging design:

JSch uses a single instance of Logger which is shared by all JSch classes, so all messages logged by JSch classes are logged with a single class/category name.

JSch's Logger instance is a static property of the JSch class. Therefore, you can't configure configure different Logger instances for individual JSch connections.


import com.jcraft.jsch.Logger;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

/**
 * Class to route log messages generated by JSch to Apache Commons Logging.
 * 
 * @author mrabbitt
 * @see com.jcraft.jsch.Logger
 */
public class JSchCommonsLogger implements Logger {
    private Log log;
    
    /**
     * Constructor with custom category name 
     * 
     * @param logName the category name used by this logger for Apache Commons.
     */
    public JSchCommonsLogger(String logName) {
        log = LogFactory.getLog(logName);
    }
    
    /**
     * Default constructor
     */
    public JSchCommonsLogger() {
        this(Logger.class.getName());
    }

    /* (non-Javadoc)
     * @see com.jcraft.jsch.Logger#isEnabled(int)
     */
    public boolean isEnabled(int level) {
        switch (level) {
        case DEBUG:
            return log.isDebugEnabled();
        case INFO:
            return log.isInfoEnabled();
        case WARN:
            return log.isWarnEnabled();
        case ERROR:
            return log.isErrorEnabled();
        case FATAL:
            return log.isFatalEnabled();
        }
        return false;
    }

    /* (non-Javadoc)
     * @see com.jcraft.jsch.Logger#log(int, java.lang.String)
     */
    public void log(int level, String message) {
        switch (level) {
        case DEBUG:
            log.debug(message);
            break;
        case INFO:
            log.info(message);
            break;
        case WARN:
            log.warn(message);
            break;
        case ERROR:
            log.error(message);
            break;
        case FATAL:
            log.fatal(message);
            break;
        }
    }

}

To use this, you'll need to do something like this somewhere in your code:


private static JSchCommonsLogger jschLogger = new JSchCommonsLogger();
com.jcraft.jsch.JSch.setLogger(jschLogger);

If you specify no argument to the JSchCommonsLogger() constructor, messages will be logged as com.jcraft.jsch. So, for example: If you're using Log4j, you can put something like this into a log4.properties to log errrors:

log4j.category.com.jcraft.jsch=ERROR

Testing SyntaxHighlighter (videoPodcastGrabber.groovy)

I've set up my blog to use SyntaxHighlighter 2.0. Rather than just test it out with a "Hello World," here's a Groovy script I wrote a couple of years ago after attending JavaOne 2007 to download the a bunch of conference videos from a video podcast feed:


#!/usr/bin/env groovy

/**
 * Downloads a remote file to a local file.
 *
 * @param url URL representing remote file
 * @param localPath full path where file is to be stored locally
 * @param expectedSize the expected size of the file in bytes
 * @return two-item array:<ol>
 *      <li>true if succesful, false if error occurred
 *      <li>message describing success or failure
 */
def downloadFile(url, localPath, Long expectedSize) {
    def isOk = false
    def message = ''

    def file = new File(localPath)

    if (file.exists()) {
        actualSize = file.length()

        // If file exists and it's exactly the expected size, then we're done with this file.
        if (actualSize == expectedSize) {
            return [true, 'File with expected name and size already exists.']

        // If file already exists and it's bigger than expected, that's just weird.  Give up.
        } else if (actualSize > expectedSize) {
            return [false, "File ${localPath} exists with ${actualSize} bytes, bigger than expected ${expectedSize}; skipping."]

        // If file already exists and it's smaller than expected, assume it's an incomplete download and remove it.
        } else if (actualSize < expectedSize) {
            // TODO Switch to use resumable download instead of starting over.
            file.delete()
        }
    }

    // Download the file.
    def out = new BufferedOutputStream(new FileOutputStream(localPath))

    URL urlObj = new URL(url)
    try {
        out << urlObj.openStream()

        def actualSize = file.length()
        if (actualSize == expectedSize) {
            isOk = true
            message = 'Success.'
        } else {
            isOk = false
            message = "Got ${actualSize} bytes but expected ${expectedSize}!"
        }

    } catch(FileNotFoundException e) {
        isOk = false
        message = "Could not download ${url}"
    }

    out.close()
    return [isOk, message]
}

class VideoPodcastItem {
    String title
    String externalLink
    String filePath
    String description
    Date pubDate
    String guid

    String toString() {
        String fileExtension = filePath.tokenize('.')[-1]
        return """\
Title: <a href="${externalLink}">${title}</a><br>
Description: ${description}<br>
Media: [<a href="${filePath}">${fileExtension}</a>]"""
    }
}

// Define some settings
def nonSafeRegex = /[^\w]+/
def fileExtension = '.mp4'
def destDir = '.'
def feedUrl = 'http://public-xml.feedroom.com/public_rss/sun_podcast_rss.xml?channel_id=179d1f683cee84e1c425b1aa164b652b7602b131'
def includedItems = { it.enclosure.@type == 'video/mp4' }

// Fetch and parse XML w/ Groovy's XmlSlurper class
def rssNode = new XmlSlurper().parse(feedUrl)

// Traverse node tree to get list of MP4 video attachments
assert rssNode.channel.size() == 1
def channelNode = rssNode.channel[0]
assert channelNode.item.size() > 0
def items = channelNode.item
println "got ${items.size()} items"
def itemsWithVideo = items.grep(includedItems)
assert itemsWithVideo.size() > 0
println "got ${itemsWithVideo.size()} items with video"

// Download the attached video files.
def successCount = 0
processedItems = itemsWithVideo.collect{
    item = new VideoPodcastItem()
    item.title = it.title[0].text().trim()
    item.description = it.description[0].text().trim()
    item.externalLink = it.link[0].text().trim()
    def videoUrl = it.enclosure[0].@url.text().trim()
    item.guid = it.guid[0].text().trim()
    Long videoByteLength = Long.parseLong(it.enclosure[0].@length.text().trim())
    item.filePath = destDir + File.separatorChar + item.title.replaceAll(nonSafeRegex, '_') + '-' + item.guid + fileExtension
    println "'${item.title}' => '${videoUrl}' => '${item.filePath}' (${videoByteLength} bytes)"

    def result = downloadFile(videoUrl, item.filePath, videoByteLength)
    // TODO Come up with a more readable way which doesn't rely on arbitrary
    //      numeric indices to to pass the results back from the function.
    if (result[0]) {
        println "Done: ${result[1]}"
        successCount++
    } else {
        println "ERROR: ${result[1]}"
    }
    println ''

    item
}
assert processedItems.size() == itemsWithVideo.size()

// Construct HTML document linking to downloaded files:
indexHtmlFilePath = destDir + File.separatorChar + 'index.html'
indexHtmlFile = new BufferedOutputStream(new FileOutputStream(indexHtmlFilePath))

indexHtmlFile << """\
<html>
<head>
    <title>${rssNode.title[0].text().trim()}</title>
</head>
<body>

<p id="feedDescription">
${channelNode.description[0].text()}
</p>
"""

processedItems.each{
    indexHtmlFile << """\
<p class="videoPodcastItem">
${it}
</p>
"""
}

indexHtmlFile << """\
</body>
</html>
"""

indexHtmlFile.close()
println "Wrote: ${indexHtmlFilePath}"

// Display summary
println "== ${successCount} out of ${itemsWithVideo.size()} successfully download. =="

Sunday, March 8, 2009

>>start<<

I'm experimenting with writing a public blog to share some of my programming ideas. Lately, my favorite language to play with has been Groovy, both for scripting and for web programming with Grails. At work, I've been working a lot with Java and the Spring framework. I also have a lot of miscellaneous Perl, Python, and Bash scripts lying around that may be worth sharing.

For the moment, I'm using Blogger, but would like to give Gravl a try sometime soon.