What is Visual Search?
Visual search (VS) makes it easy to sift through content without manually watching videos to find the subject of interest. VS uses computer vision (CV) to extract, label, and identify objects from digital media. The functional components of VS are:
Figure 1: A Functional Overview of Visual Search
Models: A model is an artificial neural network that helps identify objects embedded within the media.
Services: Applications can incorporate VS by integrating with APIs provided via three major services—media capture, CV, and search and indexing.
Computer Vision: CV is a field of artificial intelligence that trains computers to interpret and understand the visual world. The media goes through a standard set of stages to identify and verify extracted objects using the models.
Actions: Several actions may result from processing media or searching within media:
- Metadata labels extracted as part of media processing are indexed to support future searches for objects within the media.
- Search queries that span across several media files are collated as search results and presented on a web page or as a notification.
- Similarity searches using images are presented with the image representing the object, like in a shopping experience.