This seem like a systemic reverse-centaur: you can't build "AI-driven inspection systems" with general-purpose LLMs, you can only use LLMs then validate *their* output with other, reliable[1] methods. Whoever decided to use LLMs like that is unfit for the job.
[1] ML-based image classification, trained on a very narrow set of labeling data can be more reliable than a human. Try to achieve the same numbers with LLMs, I'll wait!
RE: https://aus.social/@perkinsy/116825631938165912