Apple's Vision APIs for text detection and extraction don't get enough credit.
I like them so much that I've even made a CLI app called MacOCR that enables you to turn any text on your screen into text on your clipboard. When you envoke the ocr
command, a "screen capture" like cursor is shown. Any text within the bounds will be converted to text.
You don't need to be a developer to use these APIs as they're made readily accessible via Apple Shortcuts.
The iOS Shortcuts app is a tool that lets you quickly complete tasks on your iPhone or iPad. You can create custom workflows with actions from apps on your phone, or use actions that come with the operating system (such as the "Extract text from Photo" action).
Our Apple Shortcut does a few things:
Using Apple Shortcuts to take a photo of any text you need to read and use Optical Character Recognition (OCR) is pretty easy. The prompt is more interesting.
You can use OpenAI completion to convert unstructured data into structured JSON. To do this, provide OpenAI with some examples of the type of data it will likely receive and how you want the resulting JSON object to look.
This is what our prompt looks like:
All we have to do is merge the prompt with our OCR data.
The Shortcut actions to do this as as follows:
To install the Shortcut action, you need an OpenAI account and token. To get these, follow the instructions in the OpenAI documentation. Once you have an account and token, you can download the Shortcut action from here.