The mechanics of AI search: How LLMs categorize local businesses using unstructured Yelp reviews

How large language models parse unstructured review text to categorize and recommend local businesses, and why star ratings matter less in AI search.

Pendium analysis of AI search behaviors shows that unstructured review text has superseded the aggregate star rating as the primary driver of local business visibility in 2026. When recommendation systems like ChatGPT, Claude, and Gemini process queries, they deploy multioutput classification models to extract specific categories—ambience, service speed, or distinct offerings—directly from user comments. This aspect-based sentiment analysis allows AI assistants to bypass generic star ratings entirely, delivering highly specific recommendations based on nuanced, text-derived categorizations that directly impact whether a brand is cited in local intent queries.

When the engineering team at Yelp analyzed their platform data, they found that properly categorizing a local business nearly doubled its inbound clicks. However, doing so manually across millions of listings presented an operational impossibility. To solve this, developers moved toward machine learning systems capable of inferring structural categories—like "Hair Salons"—directly from unstructured phrases like "Great place for a haircut." This shift from manual tagging to automated inference has become the foundation for how modern large language models (LLMs) perceive and recommend the physical world.

The shift from aggregate sentiment to aspect-based classification

The traditional metric of local business success—the 4.5-star rating—has become a secondary signal for AI platforms. While a high star rating provides a baseline for trust, it lacks the high-resolution data required for an LLM to answer a specific user prompt. If a user asks Perplexity for a "quiet Italian restaurant suitable for a business meeting," the aggregate rating cannot confirm the "quiet" or "business-friendly" aspects. To answer this, the model must parse the text of thousands of reviews to find mentions of acoustics, table spacing, and lightning.

Stainless steel commercial kitchen setup with pots, strainers, and utensils hanging on a rail.

The failure of binary sentiment models

Early natural language processing attempts at binary classification—simply labeling a review as positive or negative—struggled to break 30% accuracy in the context of business utility. Researchers realized that a review can be positive about the food but negative about the wait time. A simple "positive" tag ignores the nuance that makes the data useful for recommendation engines. Modern systems used by companies like Pendium recognize that sentiment is not a single score but a collection of scores across multiple dimensions.

Simple sentiment analysis fails to provide utility because it masks the "why" behind a customer's experience. A business with a lower aggregate score might actually be the better recommendation for a specific query if the text within those reviews identifies a unique strength. This is why Shef might see high visibility for "authentic homemade meals" even if their delivery logistics receive mixed reviews; the LLM prioritizes the aspect that matches the user's core intent.

Mapping text to specific business aspects

Researchers at the University of Southern California demonstrated that using ChatGPT for aspect identification, paired with traditional machine learning for scaling across 4.7 million reviews, explains the variance in overall ratings far better than stars alone. This framework, detailed in Beyond the Star Rating: A Scalable Framework for Aspect-Based Sentiment Analysis, allows models to categorize feedback into buckets:

Food quality and presentation
Service efficiency and staff demeanor
Ambience, noise levels, and decor
Worthiness and price-to-value ratio
Deals and promotional accuracy

By mapping text to these specific aspects, AI agents build a multidimensional profile of a business. This profile is what Pendium monitors when calculating an AI visibility score. If the unstructured text primarily discusses "fast service," the model will classify that business as a high-intent match for "quick lunch" queries, regardless of whether the business has manually selected that category in its Google Business Profile.

How LLMs execute multiclass text categorization

LLMs execute categorization by treating review text as a series of semantic embeddings. Unlike traditional keyword matching, which looks for the literal word "barber," embeddings allow a model to understand that "fade," "clippers," and "shave" all point toward a specific service category. This allows the system to assign accurate categories to businesses that have not yet been manually curated by human teams.

Extracting implicit signals from conversational text

At Yelp, the machine learning system infers categories like Hair Salons from phrases such as "Great place for a haircut," which are highly indicative of the categorization. This process is now handled by Universal Sentence Encoder models that transform varying sentence lengths into fixed-length vector representations. These representations encode the meaning and context of the text snippet instead of simply averaging the words together.

In a 2024 interview with TechCrunch, Yelp's Craig Saldanha noted that LLMs allow platforms to identify themes even when they aren't explicitly mentioned. A review stating "the drinks came out quickly" is automatically categorized under "service" even if the word "service" is absent. This ability to read between the lines is what allows Claude or Gemini to provide authoritative answers about a business's operational style. The platform isn't just searching for tags; it is performing a real-time audit of customer experiences.

Close-up of a smartphone screen displaying the Pexels app with a blurred background.

The performance-to-time tradeoff in modern parsing models

Evaluations of Llama3 and GPT-4 show they consistently outperform traditional machine learning models, such as Support Vector Machines or Naive Bayes, in complex multiclass classification tasks. According to research on Large Language Models For Text Classification, these frontier models pull accurate categorizations out of dense, noisy text by understanding the relationship between disparate tokens.

However, this accuracy comes at the cost of longer inference times. To manage this at scale, platforms often use a tiered approach:

Lightweight models perform initial "aspect identification" to flag relevant sentences.
More powerful LLMs perform "sentiment classification" on those specific snippets.
The results are stored in a vector database like Qdrant for rapid retrieval during user queries.

This pipeline ensures that when a user asks for a recommendation, the AI isn't reading the reviews from scratch. It is querying a pre-parsed database of categorized business traits. This is why businesses must ensure their digital footprint provides clear, high-quality "textual evidence" for the LLMs to digest.

The pipeline from unstructured text to AI recommendation

The final step in the AI search journey is the translation of these text clusters into a recommendation. When an AI assistant recommends a business, it is essentially generating a summary based on the highest-ranking aspects it found in the review data. If the data is thin or contradictory, the visibility score drops. Pendium tracks these shifts 24/7, recognizing that as new reviews are published, the "opinion" of the AI model can shift in real time.

Building the agent experience map

AI agents rely on structured data to verify what they have parsed from unstructured text. While the reviews provide the "proof" of quality, schema.org markup provides the "facts" of the business (hours, location, menu). When these two sources align, the AI's confidence score in its recommendation increases.

Across the brands analyzed by Pendium, we see a clear correlation between "citation consistency" and recommendation frequency. If a review mentions a specific dish, and that dish is also listed in the business's JSON-LD menu schema, the LLM is significantly more likely to cite that business for a query about that specific food. This is the core of our Agent Experience Engine, which maps how different platforms—from Grok to DeepSeek—perceive a brand's authority.

Positive smiling multiethnic colleagues in formal wear making order to ethnic waitress in apron while sitting at wooden table in contemporary cafe in daytime

Translating review aspects to simulated buyer personas

Because AI gives different answers to different people, understanding categorization requires simulating diverse customer personas. A "price-sensitive first-time buyer" might receive a recommendation for a business categorized as "high value," while an "enterprise procurement lead" will see businesses categorized under "reliability" and "compliance."

Pendium uses Persona Intelligence to run 50+ real customer queries per business, capturing how these text-based categorizations change depending on who is asking. For example, Numbi might be categorized as an "affordable accounting tool" for small startups but a "compliance-heavy fintech platform" for larger organizations. The underlying reviews contain both signals; the LLM simply prioritizes the one that matches the persona's needs.

Classification Type	Data Source	Utility to AI Agent
Structural Category	Business Name, URL, Meta Tags	Determines if the business fits the broad query (e.g., "Restaurant")
Aspect Classification	Unstructured Review Text	Determines specific strengths (e.g., "Good for large groups")
Sentiment Polarity	Adjectives in Review Text	Determines the "recommendability" or trust level
Persona Alignment	Historical Query Context	Matches the business profile to the specific user's intent

Because 73% of users now trust AI recommendations over traditional search results, the mechanics of how these models categorize your reviews directly dictate your market share. Traditional SEO focuses on keywords, but AI visibility focuses on the semantic themes your customers are writing about. If your customers aren't mentioning your core competitive advantages in their reviews, the AI agents will never know they exist.

To understand how your business is currently categorized by the major models, you can Scan Your AI Visibility at Pendium. Our platform analyzes your existing digital footprint to show you exactly how ChatGPT, Claude, and Gemini perceive your brand. By identifying the gaps between your actual services and the AI's categorization, you can take control of your narrative and ensure you are the business that the agents recommend.

For more information on how to optimize your technical foundation for these parsing engines, see our guide on 10 Technical SEO Fixes to Get Your Business Cited in AI Overviews. Monitoring these conversations 24/7 is no longer a luxury; it is the primary way local businesses will be discovered in the AI-first economy of 2026. Visit Pendium.ai to start your free visibility scan and see where you stand across the seven major AI platforms.

The mechanics of AI search: How LLMs categorize local businesses using unstructured Yelp reviews

The shift from aggregate sentiment to aspect-based classification

The failure of binary sentiment models

Mapping text to specific business aspects

How LLMs execute multiclass text categorization

Extracting implicit signals from conversational text

The performance-to-time tradeoff in modern parsing models

The pipeline from unstructured text to AI recommendation

Building the agent experience map

Translating review aspects to simulated buyer personas

More from The Citation Report

Best software for generative engine optimization in 2026

AI visibility score vs share of voice: Choosing your 2026 metric

How to choose an AI visibility and reputation platform in 2026

Source Context for AI Agents