What is Visual Search? A Primer for Local Businesses

There are a number of different technologies that utilize visual search. This article will give examples of a few implementations of visual search technology with a focus on Google’s ‘computer vision’.

As business owners and search marketers, we know that images are important to our customers and potential customers. We also know that today’s consumers are much more likely to make searches using images. Fortunately, Google’s technology for understanding images has progressed in leaps and bounds over the last few years.

In order to make best use of the images that we upload to our Google Business Profiles and our websites, it’s helpful to understand how Google uses technology to interpret our images.

In the following guide we’ll:

define visual search;
explore Google’s visual search technology;
discuss how technology is used to understand the content of our images and how Google uses this information to serve the most relevant search results;
talk about the user intent behind visual search and where it might fit in the buyer’s journey; and
outline the things that we can do to make better use of images.

What is visual search?

A search made using images rather than text or voice is commonly referred to as a ‘visual search’, and the technology employed to make sense of an image search is called ‘computer vision’ (more on this below).

Visual search is not to be confused with image search. When you make an image search you use words to search for images relevant to your query, whereas with visual search the image itself is the query.

Early versions of visual search relied on computers making sense of images using the text embedded in image metadata. Later on, visual search was commonly based around matching images online. If you’ve ever been involved in link building activities, you might remember using reverse image search to find instances of your own or your clients images online that had been used without the appropriate credit.

These days, visual search is much more nuanced in terms of both the image input and the types of results that are returned.

Why are images so important to us?

Much has been written about the rise of our visual internet culture, and we’re all very familiar with websites that emphasize visual media, such as Pinterest and Instagram. But why are images so important and powerful?

Well, approximately 50% of the surface of the human brain is devoted to processing visual information—visuals are much more powerful and easier to understand than just text, as well as being easier to recall.

At least 65% of people are thought to be visual learners, plus humans have a remarkable ability to remember pictures (you can find all the science stuff behind this here).

As web users, visual search gives us the opportunity to search in a way that isn’t possible using words alone. And, as marketers, visual search gives us an opportunity to attract and convert customers.

In a BrightLocal study, 60% of consumers said local search results with good images encourage them to make a decision. As marketers and business owners, we need to make sure that our images are relevant, high quality, and compelling in order to appeal to our visually-oriented potential customers!

Throughout the years, Google’s search engine results pages (SERPs) have become increasingly visual and we’re used to seeing search results—on both desktop and mobile—that brim with images.

In this search for ‘oak furniture near me’, Google has pulled what it believes are the most relevant images from the ranking GBPs and websites into the SERP itself. In the example below, we can see images appearing in the Local Pack and organic results on desktop:

As local marketers, we know that our image content and the way that it appears in the SERPs is likely to make a big difference to our potential customers in terms of consideration and conversion.

Marketers and local businesses have often thought of their GBP cover photo as being the most important image on their profile—this is often (but not always) the case for a branded search. When businesses are competing for a spot in the Local Pack, Google will choose the image from the ranking GBPs that it understands as the most relevant for the query, and this is often NOT the cover photo.

Google’s Visual Search Technology

In the same way that Google has developed in terms of understanding web pages based on the text content (read this for an excellent overview of how search works), Google has also made great progress in evolving its computer vision technology. This has been achieved by creating a system that can increasingly process, analyze, and make sense of visual data in a similar way to how humans do.

For many, one of the most familiar new technologies in visual search is Google’s Cloud Vision API, which can detect objects and faces, read printed and handwritten text, and assign labels to images in order to classify them into millions of predefined categories.

To see an example of this, upload one of your business images into the Cloud Vision API demo and note what Google is ‘seeing’ when it processes your image.

In the example below, I’ve uploaded an image of a customer enjoying a ‘sloth feeding experience’. Does Google ‘see’ the image the way that I’d like it to be understood?

Face Detection

Face Detection detects multiple faces within an image along with the associated key facial attributes, such as emotional state or if a person is wearing headwear. In this image, both of the people are wearing face masks and this will hinder Google’s ability to process their faces. Google has identified two faces here but sadly isn’t assigning ‘joy’ to either of them!

Objects

The Vision API can detect and extract multiple objects in an image. Here it appears to be identifying the sloth as a cat. This would suggest that additional photos need to be uploaded in which Google can better recognise the sloth and perhaps the carrot that it is being fed with.

Labels

The Vision API can detect and extract information about entities in an image, across a broad group of categories. Labels can identify general objects, locations, activities, animal species, products, and more. Google is assigning multiple labels to this business, and not necessarily the ones that the business would want assigned.

SafeSearch

SafeSearch Detection detects explicit content such as adult content or violent content within an image. This feature uses five categories (adult, spoof, medical, violence, and racy) and returns the likelihood that each is present in a given image.

In the image above, Google is classing the content as possibly ‘racy’, so the business would do well to reshoot versions of this image. This could help Google to class each of these categories as ‘unlikely’ or ‘very unlikely’.

TOP TIP: If you’re having an issue with rejection of GBP posts then run the image in question through the free API test to see if Google is reading your image as containing adult content.

Understanding Google AI

Understanding how Google uses Vision AI to identify and process images reminds us that the images we use as organizations are our opportunity to feed Google with information about our businesses, products, and services via the visual medium.

If Google isn’t ‘seeing’ what we want it to see when it processes our images via computer vision technology, then we need to think about how we can update and add to our images so Google ‘reads’ them as we’d like.

Source: MomentFeed

Visual Search and User Intent

Just like when someone makes a text search or a voice search, Google needs to serve results based on what it thinks is the user intent of that search.

The intent is the primary goal the user has when making a search. Search intent can be implied from many contextual elements of the search, including the device used, the location of the searcher, elements such as time of day, world events, time of year etc, and the actual content of that search query itself (be it via text, voice, or visuals).

Textual queries when entered into a search engine are considered by many to fall into five broad intent buckets:

Informational queries—the searcher needs information because they want to find an answer to a question or to learn something.
Navigational queries—the searcher wants to go to a particular website or resource on the internet.
Commercial investigation—the searcher wants to compare products or services to find the best one to suit their needs.
Transactional queries (also referred to as commercial or high intent)—the searcher wants to buy something or is ready to to take an action (such as place an order or make a call).
Local queries—the searcher wants to find something local to them or in a specific geographic area.

At present, it’s likely that visual searches most likely fall into the informational (“what’s the name of this wild flower?”) and transactional (“where can I buy a bag like this?”) categories.

You can learn more about search intent through the BrightLocal Academy course—How to Master Local Keyword Research.

Examples of Visual Search in Action

Let’s take a quick look at some of the ways that websites and apps are incorporating visual search.

On the Pinterest platform, Pinterest Lens allows users to use their smartphone or existing pinned images to suggest related shoppable products, which enables users to ‘shop with their camera’:

The search intent on the ‘shop’ tab is unsurprisingly ‘transactional’—users can browse the results and go ahead and buy any of the available products.

Online Retailers

Many online retailers have incorporated visual search into their websites in order to surface relevant products for their users. IKEA is doing so through its app and website to allow users to search IKEA products using a photo:

Here the implied intent is purely transactional. Users can browse and buy any number of IKEA products that the computer vision technology has matched to a visual search.

Google Lens

Google Lens exists as an app in its own right, and is also available via the Google app and the Google Photos app. Google Lens is also built into the camera app of many Android devices. I use this feature when I’m out and about walking, such as to take photos of flowers and plants to find out their names.

The possible intent of a search using Google Lens is much broader than say that of Pinterest, or the IKEA site, so Google groups search results across a number of tabs, including:

Translate
Text
Search
Shopping
Places
Dining

When an image includes text, Google will often include a ‘Search’ button. Clicking this button will prompt Google to use text pulled from the image to make a textual search:

Unlike Pinterest and visual search on retailers websites, it’s interesting to note that Google Lens is far from location agnostic.

Google makes it clear that when you agree to let Lens use your location, it uses that information to return more accurate results—for example, when identifying places and landmarks. So if you’re in Paris, Google Lens will know that it’s more likely you’re looking at the Eiffel Tower rather than a similar-looking structure somewhere else in the world.

It’s not a huge leap to assume that Google Lens might use your location to help populate your visual search results for local businesses. With this in mind, I did some light investigation into Google’s visual search capability for understanding the possible implications for local businesses when it comes to local search.

When Google Lens processed the text in an image of a product, I found examples of Google using elements of that copy to populate a SERP that included a 3-Pack:

Of the businesses that appear in the 3-Pack for this search, my eyes are immediately drawn to the third listing, which includes a photo taken at that location of headphones for sale.

Is visual search just the new voice search?

You might now be asking yourself, “Is visual search just the new voice search and therefore I don’t really need to do anything?”

We’re all familiar with the 2016 article speculating that voice search would account for 50% of searches by 2020. Of course, this didn’t happen, and we use voice search mostly to make simple commands for our digital assistants (“Hey Google, set an alarm for 6am”).

As marketers, we tend to think that visual search is going to make much more of an impact than voice search, especially when it comes to the types of results we can expect to see. One of the reasons for this is Google’s new technology—the ‘Multitask Unified Model’ (or Google MUM for short).

Google MUM is multimodal, which means it understands information across text and images. In the future, this will likely expand to more modalities like audio and video.

Google gives this example:

Google explains that eventually you might be able to take a photo of your hiking boots and ask, “can I use these to hike Mt. Fuji?”, MUM would understand the image and connect it with your question to let you know your boots would work just fine. It could then even point you to a blog with a list of recommended gear.

What are the implications for local business marketers?

Even though the technology can sound complicated and futuristic, the implications for local business marketers are actually pretty simple.

It’s likely that more and more of our customers and potential customers will begin their journey with a visual image search. This means that images will become prominent and influential in these journeys, and that Google will increasingly be evaluating the content of our images in order to present the user with the most relevant results.

So, we’ll need to make sure that the images we use on our websites and in our GBPs appeal to, and are understood by, both humans AND the technologies employed by search engines.

10 things you can do now:

To understand how Google ‘sees’ and understands images, use the Try it! Demo on your key images.
If Google isn’t seeing what you want it to see, how do you need to change your image? See the sloth feeding example above for guidance.
Make sure you really have made the most of the opportunity to upload images into your GBP. Keep these updated and make sure you include photos of your most important evergreen products and services. Google will display the most relevant image in the SERP next to your 3-Pack listing, so make sure you have relevant images for your most important search terms!
Add non-stock images to ALL of the categories available in the photos section of your GBP (this depends on your primary category but might include: exterior, interior, product, ‘at work’, common areas, rooms, and team).
Make sure that you have great quality photos on your website that accurately reflect your important products and services.
Ensure that images on your website are embedded in the HTML rather than in the CSS, as with this method your images can’t be crawled and indexed by Google. Use alt tags and optimize your image file name accordingly—YES, Google is getting very adept at understanding the content of images, but take a belt and braces approach to ‘feeding the machine’.
Optimize your images so that they load quickly and look good across a range of devices—keep in mind that Google displays images in the SERP in a square format. Also, place your image high up in the HTML content of the page.
Consider your business attributes as well as your actual products and services. If you want to be known as ‘family friendly’ or ‘wheelchair accessible’, and if you want your staff to be thought of as ‘friendly and professional’ then make sure that the content of your photos reflects this.
Optimizing your images for search isn’t a ‘one and done’ process—keep images updated, have a schedule for image upload, and audit your images regularly to make sure they continue to reflect the nature (and location!) of your business.
Create a xml image sitemap on your website.

About the author

Claire Carlile

Claire Carlile is a Google Business Profile Silver Product Expert and noted local SEO industry figure. Her work at Claire Carlile Marketing, where she helps businesses of all sizes make the most of the local search opportunity, allows her to provide real-world skills and expertise to the work she does for partners.

See all of Claire's articles (38)