I’m trying to perform the following:
- I wanna use some sort of AI (free) API to perform automatic image recognition
- The AI must base its research (and extract the consequent specific details) on a provided text input saying on which detail to focus during the image analysis.
e.g. if I provide the AI with an image of a playground full of joggers, a person on a red bicycle in the top left and another one on a green bycicle on the bottom-right, I must also be able to provide it with the text “tell me the color of the bicycle you see in the top left” and the output must be “red”.
or, if it’s an image of a winter playground, I need to be able to say to the AI ”look at the tree in the top left, recognize its characteristics, does it seem bare or lush?” and the output should be “bare”.
since I didn’t find any API that performed what I asked for in one go, I tried to manually play with the textual input and tags that the image analysis by Google Cloud Vision API provides.
The specific area where i’m working on is based on the recognition of certain bird’s sex based on chromatic details of its body
# Define a predefined list of compatible details (it's just an example)
compatible_details = ["yellow chest", "green chest", "red head", "blue head"]
# Process the textual input to extract details
def process_textual_input(textual_input):
extracted_details = []
for detail in compatible_details:
if detail in textual_input:
extracted_details.append(detail)
return extracted_details
# Analyze images and match the details
def analyze_image_with_details(image_path, extracted_details):
# Use image analysis APIs to get labels
# Use extracted details to focus only on interesting specifics
with open(image_path, "rb") as image_file:
content = image_file.read()
image = vision.Image(content=content)
# Analyze the image via Google cloud vision API
response = client.label_detection(image=image)
labels = response.label_annotations
# Compare the extracted details with identified labels
results = []
for detail in extracted_details:
for label in labels:
if detail.lower() in label.description.lower():
results.append((detail, "male" if "yellow" in detail.lower() else "female"))
return results
# main
if __name__ == "__main__":
# Example textual input
textual_input = "Look at the chest of the bird. If it's yellow, then it's male."
# Process textual input to get details
extracted_details = process_textual_input(textual_input)
# List of images to analyze
images = ["image1.jpg", "image2.jpg", "image3.jpg"]
# Analyze images with extracted details
for image_path in images:
results = analyze_image_with_details(image_path, extracted_details)
print(f"Image: {image_path}")
for detail, gender in results:
print(f"Detail: {detail}, Gender: {gender}")
but of course, as you can see, i’m only statically-extracting keywords from my text, an not letting the AI think about it and comprehend it.
How could I perform this?