audiodiff¶
audiodiff is a small Python library for comparing audio files. Two audio flies are considered equal if they have the same audio streams and normalized tags.
Dependencies¶
audiodiff requires FFmpeg to be installed in your system. The path is ffmpeg by default, but you can change it by following ways (later rules take precedence over earlier ones):
- audiodiff.FFMPEG_BIN module property
- FFMPEG_BIN environment variable
- --ffmpeg_bin flag (commandline tool only)
You can install ffmpeg with following commands.
- Debian/Ubuntu: sudo apt-get install ffmpeg
- OS X (with Homebrew): brew install ffmpeg
Install¶
audiodiff can be installed with pip:
$ pip install audiodiff
This will also install the commandline tool. Run audiodiff -h for help.
Examples¶
Suppose you have two files, airplane.flac and airplane.m4a. The second one is obtained by converting the first one with an ALAC encoder, so its audio stream should be identical with the first one’s. After the conversion, you changed the tags in the FLAC file. Then you may get the following results with audiodiff:
>>> import audiodiff
>>> audiodiff.equal('airplane.flac', 'airplane.m4a')
False
>>> audiodiff.audio_equal('airplane.flac', 'airplane.m4a')
True
>>> audiodiff.tags_equal('airplane.flac', 'airplane.m4a')
False
It means the two files are not the same because tha tags differ, but the audio streams are identical.
If you want more information about those files, you can get stream checksums and tags:
>>> audiodiff.checksum('airplane.flac')
'ed871b3c164998cf243e39d4b97d21f93bba9427'
>>> audiodiff.checksum('airplane.m4a')
'ed871b3c164998cf243e39d4b97d21f93bba9427'
>>> tags1 = audiodiff.tags('airplane.flac')
>>> tags1
{'artist': 'f(x)', 'album': 'Pink Tape', 'title': 'Airplane'}
>>> tags2 = audiodiff.tags('airplane.m4a')
>>> tags2
{'title': 'f(x) - Pink Tape - Airplane'}
It can also be used as a commandline tool. When used as a commandline tool, it supports comparing audio files in two directories recursively. Audio files with the same name except for the extension are compared to each other.
$ ls . -R
mylib1:
a.flac b.flac cover.jpg
mylib2:
a.m4a b.m4a cover.jpg
$ audiodiff mylib1 mylib2
Audio streams in mylib1/a.flac and mylib2/a.m4a differ
Audio streams in mylib1/b.flac and mylib2/b.m4a differ
--- mylib1/b.flac
+++ mylib2/b.m4a
-album: [u'Purple Heart']
+album: [u'Blue Jean']
+date: [u'2001']
Binary files mylib1/cover.jpg and mylib2/cover.jpg differ
Supported audio formats¶
Currently audiodiff recognizes only WAV, FLAC, M4A, and MP3 files as audiofiles. They must have wav, flac, m4a, mp3 file extensions, respectively. Note that WAV files are assumed to have no tags, because tagging WAV files are inconsistent among many applications.
Caveats¶
Tag reading is done by mutagenwrapper for which there isn’t a stable version yet. It may omit some tags, thus incorrectly reporting tags in files being compared are equal while they are not.
Changes¶
Version 0.3¶
(release date to be announced)
- Improved Unicode support for tags and filenames.
- Change the stream checksum algorithm from MD5 to SHA1.
- Support Python 2.6 and PyPy, in addition to Python 2.7.
Version 0.2¶
Initial release on September 10th 2013.
API reference¶
audiodiff¶
This module contains functions for comparing audio files.
- audiodiff.AUDIO_FORMATS = ['wav', 'flac', 'm4a', 'mp3']¶
Supported audio formats (extensions)
- audiodiff.FFMPEG_BIN = 'ffmpeg'¶
Default FFmpeg path
- audiodiff.equal(name1, name2, ffmpeg_bin=None)[source]¶
Compares two files and returns True if they are considered equal. For audio files, they are equal if their uncompressed audio streams and tags (as reported by mutagenwrapper, except for encodedby which is ignored) are equal. For non-audio files, they must have the same content to be equal.
- audiodiff.audio_equal(name1, name2, ffmpeg_bin=None)[source]¶
Compares two audio files and returns True if they have the same audio streams.
Compares two audio files and returns True if they have the same tags reported by mutagenwrapper.
- audiodiff.checksum(name, ffmpeg_bin=None)[source]¶
Returns an SHA1 checksum of the uncompressed PCM (signed 24-bit little-endian) data stream of the audio file. Note that the checksums for the same file may differ across different platforms if the file format is lossy, due to floating point problems and different implementations of decoders.
Returns tags in the audio file as a dict. Its return value is the same as mutagenwrapper.read_tags, except that single valued items (lists with length 1) are unwrapped and encodedby tag is removed. To read unmodified, but still normalized tags, use mutagenwrapper.read_tags. For raw tags, use the mutagen library.
- audiodiff.get_extension(path)[source]¶
Returns the file extension of the specified path. Example:
>>> get_extension('a.pdf') 'pdf' >>> get_extension('b.js.coffee') 'coffee' >>> get_extension('c') '' >>> get_extension('d/e.txt') 'txt'
- audiodiff.is_supported_format(path)[source]¶
Returns True if the specified path has an extension that is one of the supported formats.
audiodiff.commandlinetool¶
This module contains functions for the audiodiff commandline tool.
- audiodiff.commandlinetool.FALLBACK_ENCODING = 'UTF-8'¶
Fallback encoding for output. Encoding resolution is done as follows:
- sys.stdout.encoding (or sys.stderr.encoding)
- PYTHONIOENCODING environment variable
- Second item of the return value of locale.getdefaultlocale()
- FALLBACK_ENCODING
- audiodiff.commandlinetool.parser = ArgumentParser(prog='audiodiff', usage=None, description='\nCompare two files or directories recursively. For supported audio files\n(flac, m4a, mp3), they are treated as if extensions are removed from filenames.\nFor example, `audiodiff x y` would compare `x/a.flac` and `y/a.m4a`. Audio\nfiles are considered equal if they have the same uncompressed audio streams and\nnormalized tags (except for `encodedby` tag) reported by mutagenwrapper;\nnon-audio files as well as unsupported audio files are equal if they are\nexactly equal, bit by bit.\n', version=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)¶
- audiodiff.commandlinetool.main_func(args=None)[source]¶
The entry point for the audiodiff command line tool. Parses the command arguments and calls diff_checked().
- audiodiff.commandlinetool.diff_checked(path1, path2, options)[source]¶
Calls diff_recurse() and handles exceptions if raised.
- audiodiff.commandlinetool.diff_recurse(path1, path2, options)[source]¶
Recursively compares files in the specified paths.
- audiodiff.commandlinetool.diff_files(path1, path2, options)[source]¶
Compares the two files and prints the results.
- audiodiff.commandlinetool.diff_dirs(path1, path2, options)[source]¶
Compares the two directories and prints the results.
- audiodiff.commandlinetool.diff_streams(path1, path2, verbose=False, ffmpeg_bin=None)[source]¶
Prints whether the two audio files’ streams differ or are identical.
Prints whether the two audio files’ tags differ or are identical.