macOS Sierra Fujitsu glitch – Workaround

In macOS Sierra exists a problem with PDF files created with Fujitsu ScanSnap Software. It is possible to lose data. That’s a big thing. You open your PDF and it is simply a blank page. Actually they have released fixed versions.

The Workaround

If you want to be 100% secure you should recreate your PDF files. My ScanSnap S1300i was shipped with Abbyy FineReader included. All my files live in the great DEVONThink Pro. Yes it is not the Office Version with integrated Abby FineReader. I am using a Workflow that scans directly to Abbyy FineReader for ScanSnap and then hands over the file to Hazel. She processes the file and adds it to DEVONThink. So I can add Tags and rename files based on their content automatically.

So here is what you can do, prior to updating to Sierra:

  1. Make a detailed metadata search in DEVONThink for your Scanner Model. All files created by ScanSnap Manager for S1300i are displayed if you search for 1300i.bildschirmfoto-2016-10-29-um-19-25-17
  2. Right click those files after you have selected them and choose open with… Abby Fine Reader.
  3. Abby Fine Reader is then trying to OCR your PDF files. It creates new files with _OCR at the end of the file name. They are completely rewritten from Abbyy and not created in the way the ScanSnap Manager has created PDF’s before.
  4. Run the following script (see below) in the folder where all the PDF’s live. If you are using DEVONThink it is the files.noindex folder in your Database.
    bildschirmfoto-2016-10-30-um-01-12-34
  5. The script searches for all the _OCR.pdf files and writes their path to a text file in your home directory. Then processes the text file, deleting the _OCR from the file name and overwriting the original file.

Be careful!!!
Make a Backup first!!!
Try it in a separate folder with test data!!!

Here is the script:
Remove the „echo“ to rename the files finally.
The echo is your safety-belt, it prints what will be done after it is removed.

#!/bin/bash

# Dateien suchen, die den Kriterien entsprechen
# und in eine Datei schreiben.
find . -name "*_OCR.pdf" > ~/OCRFiles.txt

# Datei einlesen und jede Zeile abarbeiten
cat ~/OCRFiles.txt | while read f; do

echo mv "$f" "${f/_OCR/}"

done

The old files I had, created with ScanSnap Manager are all readable and editable.

Disclaimer

You are using this procedure on your own risk!!!
If you are not familiar with the terminal and don’t really know what is written above, then do not touch your data. Read the Fujitsu Website and follow their instructions.

 

Automator & PDFPen Applescript for Batch OCR

If you need batch OCR for PDF files and own a Mac than here is the script that saves your life 😉

First you have to make sure that PDFPen ist installed on your Mac.

Second, just create an Automator Workflow as a „Service“. Make sure you select, that it processes „files and folders“ and add „Run Applescript“ to it. Then paste the script into the „Run Applescript“ section. Make sure you have cleared the section, otherwise you will end up with two run statements 😉

AutomatorOCRScript
Here ist the script for copy&paste…

on run {input, parameters}
set myCount to count input
repeat with i from 1 to number of items in input
set this_item to item i of input
tell application "PDFpen 7"
open this_item
tell document 1
ocr
repeat while performing ocr
delay 1
end repeat
delay 1
close with saving
end tell
end tell
end repeat
tell application "PDFpen 7"
quit
end tell
end run

Now save the Automator Workflow as a Service and it will be available in the context menu in finder. Now select the files you want to OCR and have fun…