M12 solution
The expected, fully-commented artifact for M12's lab. Peek only after you've tried.
| File | What it is |
|---|---|
describe_image.py |
Image understanding (vision): reads an image, base64-encodes it, sends an image content block + a question, prints the answer. Detects media_type from the extension and handles unsupported types. |
sample.png |
A small generated test image (a steel-blue rectangle) so the app runs before you bring your own photo. |
Run it
With your venv active and your M4 .env present:
python describe_image.py sample.png "What colour is this image?"
python describe_image.py my-photo.jpg "Read all the text in this image."
How this was verified
Verified on Python 3 with the live model call mocked (no key):
- compiles;
- the vision pipeline reads sample.png, base64-encodes it for real (300 chars), and builds the
request with a correct image block (type: base64, media_type: image/png) plus the text
block, confirmed by inspecting what the mocked client received;
- an unsupported extension (.bmp) is handled gracefully (clear message, no crash).
The only unverified step is the live vision call (needs a real key), the learner's run. Image generation / audio are out of scope for the Claude stack (see notes' survey). No API key or billed call was used here.