Checking checksums

How many times have you downloaded a boot image or compressed ISO file and written it to it to an SD card or CD without checking the integrity of the file?

You really should – and it isn’t very hard.

Using a hashing algorithm like SHA-1 or MD5 allows you to verify the integrity the file by computing it’s SHA-1 or MD5 checksum and allowing you to compare it to a known value. Some sites will show the checksums on the download page and others will provide a separate checksum file – usually when there are multiple files.

Note – By itself the checksum only verifies the integrity of the file.

Using SHA-1

To generate the SHA-1 checksum for a file (or group of files).

$ sha1sum *.zip
1852df83a11ee7083ca0e5f3fb41f93ecc59b1c8  archive.zip
7d6d71265d8d0da32d82e0137f08e3f2998a7d95  download.zip

If the download page shows the checksum then when you have downloaded your file all you need to do is check that this matches your version. You can also save the checksum information in a file and use it to verify that the files have not been modified later.

$ sha1sum *.zip > checksum.sha
$ cat checksum.sha
1852df83a11ee7083ca0e5f3fb41f93ecc59b1c8  archive.zip
7d6d71265d8d0da32d82e0137f08e3f2998a7d95  download.zip

$ sha1sum -c checksum.sha
archive.zip: OK
download.zip: OK

Using MD5

To generate the MD5 checksum for a file (or group of files).

$ md5sum *.zip
97f5e00ce1975165bcbfba39f117d9dd  archive.zip
13c3f38e949bc6838c63f741655238e4  download.zip

You can save the MD5 checksums by redirecting the output from the command to a file.

$ md5sum *.zip > checksum.md5
$ cat checksum.md5
97f5e00ce1975165bcbfba39f117d9dd  archive.zip
13c3f38e949bc6838c63f741655238e4  download.zip

Using the saved checksums you can validate the integrity of the files.

$ md5sum -c checksum.md5
archive.zip: OK
download.zip: OK

If you update the files and then updating the checksum file then when you validate the files you will get an error that indicates that the checksum file was modified, but this should be expected as you just changed it!

$ md5sum -c checksum.md5
archive.zip: OK
checksum.md5: FAILED
download.zip: OK
md5sum: WARNING: 1 computed checksum did NOT match

Using find

To overcome the problems above you could simply delete the checksum file before regenerating the checksums, but what if you don’t want to generate checksums for other files like temporary or backup files?

To do this you can use ‘find’ to select which files will be used, or in this case ignored, when generating the checksums. Although this is a much longer command than just using ‘md5sum’ by itself it has the advantage that when checking the integrity of the files later you will not get an error message, and you can also modify it to exclude other types of file as well.

$ find . -maxdepth 1 ! -iname "*.tmp" ! -iname "*.md5" ! -name "*~" \
-type f -exec md5sum {} \; |tee checksum.md5
97f5e00ce1975165bcbfba39f117d9dd  ./archive.zip
13c3f38e949bc6838c63f741655238e4  ./download.zip
$
$ md5sum -c checksum.md5
archive.zip: OK
download.zip: OK
$

If you don’t specify the depth then you can generate a single checksum file for all the files in the current folder and all sub-folders, which is something that the ‘md5sum’ or ‘sha1sum’ commands can’t do by themselves.

$ find . ! -iname "*.tmp" ! -iname "*.md5" ! -name "*~" -type f \
-exec md5sum {} \; |tee checksum.md5
6b6e0c8e30f97c52132e260cdff298aa  ./.tmp/cache
97f5e00ce1975165bcbfba39f117d9dd  ./archive.zip
13c3f38e949bc6838c63f741655238e4  ./download.zip
d41d8cd98f00b204e9800998ecf8427e  ./src/source.c
aadd4f628604fd40e5a4f9bf80e60480  ./src/.source.old
$
$ md5sum -c checksum.md5
./.tmp/cache: OK
./archive.zip: OK
./download.zip: OK
./src/source.c: OK
./src/.source.old: OK

Since all this command does is to use ‘find’ to select the files before generating the checksum, you have the flexibility to use any of the other features of the ‘find’ command. So if you want you can also exclude hidden files and folders you can do so by adding the additional qualifier highlighted in red which causes find to ignore any files with path names beginning with a ‘.’ (this will include ALL files in hidden folders).

$ find . ! -iname "*.tmp" ! -iname "*.md5" ! -name "*~" -type f \
! -path "*/\.*" -exec md5sum {} \; |tee checksum.md5
97f5e00ce1975165bcbfba39f117d9dd  ./archive.zip
13c3f38e949bc6838c63f741655238e4  ./download.zip
d41d8cd98f00b204e9800998ecf8427e  ./src/source.c
$
$ md5sum -c checksum.md5
./archive.zip: OK
./download.zip: OK
./src/source.c: OK

If you just want to generate checksums for a file of a particular type then you don’t need to explicitly ignore any files (unless you don’t want to include hidden files or folders).

$ find . -iname "*.c" -type f ! -path "*/\.*" -exec md5sum {} \; \
|tee checksum.md5
d41d8cd98f00b204e9800998ecf8427e  ./src/source.c
$
$ md5sum -c checksum.md5
./src/source.c: OK

You can also use find to search for every folder before generating the checksum – this allows you to create a seperate checksum file in the current folder and every sub folder with one command.

$ find . -type d -print -exec sh -c "cd '{}';find . -maxdepth 1 \
\! -iname '*.tmp' \! -name '*.md5' \! -name '*~' -type f \
-exec md5sum \{\} \; |tee checksum.md5" \;
.
97f5e00ce1975165bcbfba39f117d9dd  ./archive.zip
13c3f38e949bc6838c63f741655238e4  ./download.zip
./.tmp
6b6e0c8e30f97c52132e260cdff298aa  ./.tmp/cache
./src
d41d8cd98f00b204e9800998ecf8427e  ./src/source.c
aadd4f628604fd40e5a4f9bf80e60480  ./src/.source.old
$

You can exclude files in hidden folders by excluding them from the directory search, but by itself this will not exclude hidden files (unless in a hidden folder).

$ find . -type d ! -path "*/\.*" -print -exec sh -c "cd '{}';find . \
-maxdepth 1 \! -iname '*.tmp' \! -name '*.md5' \! -name '*~' -type f \
-exec md5sum \{\} \; |tee checksum.md5" \;
.
97f5e00ce1975165bcbfba39f117d9dd  ./archive.zip
13c3f38e949bc6838c63f741655238e4  ./download.zip
./src
d41d8cd98f00b204e9800998ecf8427e  ./source.c
aadd4f628604fd40e5a4f9bf80e60480  ./src/.source.old
$

Note – You must specify that you want to exclude all folders that match specified path after specifying the type.

To exclude both hidden folders and hidden files you need to specify both explicitly.

$ find . -type d ! -path "*/\.*" -print -exec sh -c "cd '{}';find . \
-maxdepth 1 \! -iname '*.tmp' \! -name '*.md5' \! -name '*~' -type f \
! -path '*\/\\.*' -exec md5sum \{\} \; |tee checksum.md5" \;
.
97f5e00ce1975165bcbfba39f117d9dd  ./archive.zip
13c3f38e949bc6838c63f741655238e4  ./download.zip
./src
d41d8cd98f00b204e9800998ecf8427e  ./source.c
$

This has given me a few ideas for other useful things I can do using find – but they will have to wait for another day!

Advertisements
This entry was posted in Linux, Security and tagged , , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s