I have a directory full of different files obtained from the Internet and it turned out that some of them contain UTF-8 characters because of which indexing didn't work. So, I had to find all files that contain such characters. The solution I found was the following one: LC_ALL=C find . -name '*[! -~]*' This command will print all filenames with embedded unicode characters represented as