Tue, Mar 10, 2026
Lately I've been catching up on open source. This is the backstory behind the Cookiecutter release cascade. One quick release turned into four all-consuming releases, a licensing dispute, chardet removal, a new decision tree classifier in binaryornot, and my new interest in becoming an expert at designing classifiers.
Mon, Mar 9, 2026
BinaryOrNot identifies binary files three ways: by extension, by file signature, and by content analysis. Pass it any file path and it tells you binary or text, accurately, across PNGs, PDFs, executables, archives, fonts, CJK-encoded text, and hundreds of other formats.
Sat, Mar 7, 2026
If you've ever had BinaryOrNot misidentify a UTF-16 file, choke on a CJK-encoded document, or crash because chardet changed its API, this release is for you.
Mon, Nov 9, 2015
A common thing to do in Python is to go through a directory tree, opening each file and doing something with the file's text. Here's what to do when you hit a UnicodeDecodeError from accidentally opening a binary file.