Published on [Permalink]
Reading time: 4 minutes

Privacy, local AI and a code gap that still matters

I received some of my annual medical test results this week and needed to redact my personal details before sharing them. Standard stuff - name, date of birth, Medicare number, address. But the files were JPEGs, not a Word document I could just edit.

My first instinct was to upload them to Claude and let it handle everything. Then I stopped myself. Medical records. Cloud service. Maybe not.

I have a full local LLM stack running on my Mac - Open WebUI fronting Ollama, with Gemma and Qwen among the models available. The whole point of that setup is exactly this: processing sensitive material without it leaving the machine. This was the moment to use it.

The analysis was impressive

I uploaded the medical report images to Gemma 3 27B through Open WebUI and asked it to identify all confidential information.

The response was genuinely excellent. Gemma correctly identified four fields - patient name, date of birth, Medicare number and address - described their approximate position on each page in millimetres, suggested appropriate redaction strategies for each, and flagged considerations around OCR accuracy and layout variation across pages. If you’d asked me to write a specification document for a developer, that’s exactly what I’d have produced.

So far, local AI: 1. Privacy concern: addressed.

Then it had to write the code

I asked Gemma to produce a Python script to do the actual redaction.

It imported pytesseract - the standard OCR library for Python - and then never used it. The entire “redaction” was a handful of hardcoded pixel coordinates: x1: 60, y1: 25, x2: 250, y2: 45. On a scanned medical document at 300 DPI, those coordinates point to nothing in particular. The script would have drawn small black rectangles somewhere near the margin and called it done. All that careful analysis about field positions - ignored entirely in the implementation.

I tried Qwen2.5-coder:32b next, which is specifically a code-focused model. It made more effort - a preprocessing step to sharpen the image, a two-stage approach that used OCR to verify text within the bounding boxes. Better architecture on paper. But it referenced np.array without ever importing numpy. It would have crashed on the first image before redacting a single pixel. And underneath the more elaborate structure, the same hardcoded coordinates sat there, unchanged.

The fundamental problem with both scripts: coordinates-first, OCR-as-an-afterthought. The right approach is to OCR the whole image, get real bounding boxes from the engine itself, then pattern-match against those. No guessing where anything is.

What actually worked

I took the problem to Claude - the cloud version, yes, the irony is noted - described the correct approach and asked for a proper implementation.

The working script OCRs each image at the word level, groups words into lines, runs regex patterns against each line (name, DD/MM/YYYY dates, 10-digit Medicare numbers, address fragments) and draws redaction boxes using the coordinates Tesseract itself reports. It ran a dry-run across all five images - four medical report pages and a radiology report - and correctly identified every confidential field on the first pass.

There was one small refinement. The date pattern was catching the report date and collection date as well as the date of birth. One quick tweak - exclude any DD/MM/YYYY date where the year is the current year - and the false positives disappeared. Five clean redacted images, originals untouched.

The performance question

Speed is the other variable. I’m running an M4 Max MacBook Pro with 48GB of RAM. This is not modest hardware. Gemma 3 27B is a large model but well within what this machine can handle.

The analysis response was slow. Not agonising, but noticeably slower than a cloud model. For reading a document and describing what you see, that’s an acceptable trade-off when privacy is the priority.

The code generation was both slower and worse. That combination is what stings.

Where this leaves local LLMs

The privacy instinct was right. Medical records should not go to a cloud service if you can avoid it.

The capability gap is real. Local models in 2026, on good hardware, are genuinely useful for analysis, description and reasoning about documents. The moment you ask them to produce working code for a moderately complex task, results become uneven - and code that fails silently is worse than no code at all.

The Gemma analysis was good enough to use directly as a specification. That specification was then handed to a cloud model to produce the implementation. Which is a workflow, I suppose - just not the one I was hoping for.

Of course the most efficient workflow was to get a big black sharpie and blot out the details and re-scan. Works for me but it wouldn’t scale. Besides which why not take the chance to test our local LLM stack. I will persist … hope my patience holds.

✍️ Reply by email