Skip to content

M12 solution

The expected, fully-commented artifact for M12's lab. Peek only after you've tried.

File What it is
describe_image.py Image understanding (vision): reads an image, base64-encodes it, sends an image content block + a question, prints the answer. Detects media_type from the extension and handles unsupported types.
sample.png A small generated test image (a steel-blue rectangle) so the app runs before you bring your own photo.

Run it

With your venv active and your M4 .env present:

python describe_image.py sample.png "What colour is this image?"
python describe_image.py my-photo.jpg "Read all the text in this image."

How this was verified

Verified on Python 3 with the live model call mocked (no key): - compiles; - the vision pipeline reads sample.png, base64-encodes it for real (300 chars), and builds the request with a correct image block (type: base64, media_type: image/png) plus the text block, confirmed by inspecting what the mocked client received; - an unsupported extension (.bmp) is handled gracefully (clear message, no crash).

The only unverified step is the live vision call (needs a real key), the learner's run. Image generation / audio are out of scope for the Claude stack (see notes' survey). No API key or billed call was used here.