Development, GNU/Linux, Subversion

Using find & grep to search subversion working copies

Alhamdulillaah, I was able to write a very simple bash function that will efficiently scan an entire directory tree for files which contain a string and display those lines, along with line numbers, and highlight the searched term in the terminal. The result can be found below:

function search
{
    time find . -name '.svn' -prune -o -exec grep -inH --color -e "$1" '{}' ;
}

The search function takes as its only parameter the string for which you are searching. Three standard *nix programs are utilized – time, find, & grep.

The time program is completely optional, and I just like to use it on most commands I execute in the terminal because I like to see how my different parameters affect general execution time.

There are seven arguments to the find command, amongst them being the call to the grep command, which will be explained later. The first argument to find, ‘.‘, simply tells find “search in this directory”. By default, find will also recursively descend into subdirectories, and this is the behavior that I was looking for. The second argument, ‘-name‘, states that I want to match according to a name or pattern. The third argument, ‘.svn‘, is the pattern which I am trying to match – in this case, the special directories that hold specific subversion information. The fourth argument, ‘-prune‘, tells find that I want to exclude directories that match the previous pattern. This means, do not list directories that match ‘.svn’ nor their contents. The fifth argument, ‘-o‘, behaves like a logical OR. In the context of this usage, it means, execute the previous match OR what comes after me. So, if the first pattern matched, then find moves to the next file, or begins to execute the next parameter, which is ‘-exec‘, which tells find that I want to execute a command for every matched file (in this case, not the 1st case, which would be all files that do not match ‘.svn’).

The seventh argument is actually a command that find will run on every match in the current case, which is a call to the grep command. Specifically, the arguments to grep are -i, which means search without regard to case (case-insensitivity), -n, which tells grep to print the line number with every match, -H, which tells grep to print the filename for every match, --color, which tells grep to highlight the searched term in the matched line, and -e, which tells grep that the pattern is a regular expression. “$1” actually tells bash to insert the first argument to the call to the function (search) on the command line – in this case, it is the actual term for which we are searching.

The last part of the function definition has two parts, “'{}'” and “\;”. “'{}'” is the syntax find uses to insert the filename of the matched file in the current case when running with the -exec argument. The reason I put the curly brackets in quotes was based on the tip from the find manpage, which stated that curly brackets are generally parsed by the shell environment. Putting them in quotes will prevent this. The final bit is the escaped semi-colon, “\;”. This tells the -exec argument of the find command that there are no more arguments to pass to the command find should run for every match in this case. Once again, because semi-colons are generally parsed by the shell environment, escaping them with a forward-slash will prevent them from being parsed, thus allowing them to be passed to the find command.

It’s important to note that there are some caveats due to the meaning of certain characters being passed. For example, since I am using the -e argument within the grep command to specify I am matching by a regular expression pattern, characters such as “.”, which hold special meaning within regular expressions (e.g., “.” means “match any character here”), then you will need to escape the character. However, because the argument to the search function is being passed through the shell environment twice, that means, to include a literal “.” in your search, you will have to escape it twice. Just a tip I found out when doing a search for “d.enddate” (a field in a SQL database query).

So there you have it! A very fast & efficient way to search the contents of all files within a subversion working copy set without also having to go through the special .svn directories, which frequently munge the return results. Previously, I had been using only the grep command with the addition of the -r argument, which would tell it to recursively descend into directories. This works well, but it does not allow me to exclude the .svn directories. Combining grep with find, however, allows me to get exactly what I want. Alhamdulillaah.

Leave a Reply

Your email address will not be published. Required fields are marked *