NLP (Natural Language Processing) in SEO enables precise content matching by analyzing semantics and user intent. According to Moz 2024 research, 78% of high-ranking pages use this technology;
NLP processing accounts for over 70% of Google’s core algorithm BERT, improving content expertise and credibility, meeting EEAT standards.
I willbreak down how Google uses NLP to make search results “understand you” better.

What is NLP
NLP (Natural Language Processing, Natural Language Processing) is a technology that allows computers to understand, analyze, and generate human language.
There are over 8.5 billion search requests globally every day (Google 2024 public data), of which approximately 60% of queries contain implicit semantics or polysemous expressions (e.g., “apple” could refer to the fruit, a phone, or a music album).
Traditional search engines could only “match keywords“, but NLP can decompose unstructured text into semantic units (e.g., breaking “2025 iPhone 15 waterproof test” into “2025款”, “iPhone 15”, “waterproof test” three entities), then construct semantic networks through contextual relationships (e.g., the relationship between “waterproof” and “phone features”), ultimately enabling machines to “read” the real intent behind the text.
Evolution from “Keyword Matching” to “Semantic Understanding”
To understand how NLP enables Google to “read” text, we need to go back to the “childhood” of search engines——1990s to early 2000s.
Back then, search technology was as primitive as a “word dictionary”: when a user typed “coffee”, the engine would simply pull up all web pages containing the two characters “咖啡”.
Some people deliberately repeated “weight loss” “weight loss” “weight loss” on a page just to be seen by users searching for “weight loss” (weight loss).
Mechanical “Word Counter” (1990s-early 2000s)
Early search engines (like AltaVista in 1995, Yahoo in 1998) core algorithm was TF-IDF (Term Frequency-Inverse Document Frequency)—simply put, “count how many times a word appears on a web page, the more times, the more relevant”.
For example, when a user searched “Java”, the system would prioritize displaying pages with high word frequency like “Java programming” and “Java tutorials”, but if it encountered a page about “Java coffee” (a coffee variety), it would still be misjudged because “Java” appeared many times.
In 2003, a study by UC Berkeley analyzed results from major search engines at that time: when users searched “苹果” (apple), among the top 20 results, 45% were fruit-related content, 30% were Apple product-related, and the remaining 25% were irrelevant “apple pie recipes” and “apple tree cultivation”——users needed to manually filter, averaging 3.2 clicks to find the target (2003 Forrester research data).
Some websites started “gaming the system”: for example, when users searched “best laptop”, bad websites would repeat words like “best”, “laptop”, “recommended” on the page, or even use hidden text (white text on white background) to stuff keywords.
In 2005, Google had to publicly admit: “Approximately 30% of low-quality pages reached the top 10 through keyword stuffing.” (Google Search Quality team internal report)
Statistical Model’s “Fuzzy Reasoning” (Mid-2000s-early 2010s)
In the mid-2000s, with explosive growth of internet content (approximately 1 billion web pages globally in 2000, growing to 50 billion by 2010), relying solely on keyword counting had completely failed.
Search engines began introducing statistical language models, trying to understand word relationships through “contextual probability”.
For example, the ”Phrase Match” technology launched by Google in 2008: the system no longer looked at individual words, but analyzed the frequency of “phrase combinations”.
For instance, when a user searched “how to brew coffee”, the system would prioritize matching pages containing words like “brew”, “coffee”, “water”, “temperature” simultaneously, rather than pages with only “coffee”. This technology improved search result relevance by approximately 12% (Google 2009 tech blog data).
In 2012, Google further launched the ”Knowledge Graph“, transforming discrete words into a network of “entities + relationships”.
For example, “Einstein” is no longer just a word, but is tagged with entity attributes like “physicist”, “birthplace Ulm, Germany”, “proposed the theory of relativity”.
When users search “Einstein”, the system not only returns biography pages but can also directly display his birth/death years, quotes, and even link to explanation pages about “relativity”.
After the Knowledge Graph launch, Google official data showed: 40% of user search needs were directly satisfied (without clicking links) (2013 Google official press conference).
But this still wasn’t enough——the Knowledge Graph relied on manually annotated “structured data”, while 90% of internet content is unlabeled “unstructured text” (like blog posts, forum threads). To make machines understand this “disorganized text”, more powerful technology was needed.
From “Statistical Patterns” to “Semantic Understanding” (Mid-2010s-present)
In the 2010s, breakthroughs in deep learning technology (especially the development of neural networks) completely changed NLP. In 2013, Google researcher Tomas Mikolov proposed the Word2Vec model, first mapping words to “vector space”——for example, the vector difference between “king” and “queen” is highly similar to the vector difference between “man” and “woman”, meaning the model can “understand” semantic relationships between words.
In 2016, Google introduced RankBrain (a deep learning-based ranking algorithm) into search, which can automatically “learn” the relevance between user search behavior and content.
For example, when a user searches “cheap wireless earphones”, RankBrain analyzes which pages users click and then have long dwell time and low bounce rate, thereby determining the true relevance of “cheap”, “wireless”, “earphones”.
Google 2017 published data: RankBrain improved the relevance of long-tail queries (uncommon search terms) by 25% (e.g., “bone conduction earphone recommendations for running”).
In 2018, Google launched the BERT model (bidirectional Transformer architecture), completely solving the “contextual ambiguity” problem. Traditional models could only understand sentences “unidirectionally” (e.g., left to right), but BERT can analyze both “cause and effect” simultaneously.
For example, in the sentences “Xiao Ming’s apples are ripe” and “Xiao Ming took a bite of an apple”, BERT can determine based on context: the first “apple” is fruit, so is the second——but if the sentence is “Xiao Ming’s Apple released a new system”, BERT will immediately recognize “Apple” refers to the company.
BERT’s effect was immediate:
Google 2019 internal testing showed, CTR (Click-Through Rate) for complex queries increased from 18% to 25%;
In 2023, Google Search Liaison team publicly stated: BERT improved the accuracy of polysemous queries from 58% to 82% (e.g., when users search “Python”, the model can determine based on context whether it’s a programming language or a snake, accuracy improved by 24 percentage points).
From “Matching Words” to “Understanding People”
Reviewing the evolution history of NLP, the essence is the leap of search engines from “mechanically executing commands” to “understanding human needs”:
- 1.0 Era (Keyword Matching): machines like “word counters”, can only match literally;
- 2.0 Era (Statistical Models): machines like “probability analysts”, inferring intent through contextual probability;
- 3.0 Era (Deep Learning): machines like “language learners”, able to “learn” semantic logic through massive data.
In 2024, Pew Research Center survey showed, 78% of users believe current search results “better match real needs”, while in 2010 this ratio was only 41%.
Google Chief Scientist Jeff Dean said: “NLP’s goal is not to make machines ‘read text’, but to make machines ‘understand people’.”
NLP’s “Core Work”
To make machines “understand” a piece of text, NLP needs to, like humans parsing sentences, process the “information fragments” in language step by step.
When Google’s NLP system (like an improved version of BERT) processes web page content, it strictly follows the tokenization → entity recognition → semantic relation → contextual correction 4 steps to complete “text decoding”.
Step 1, Tokenization
Tokenization is the first step of NLP—simply put, cutting a continuous text sequence into independent “semantic units” (called “tokens”).
Chinese has no natural space separation (unlike English “apple pie” with spaces), so tokenization is the core difficulty of Chinese NLP.
Technical Principle:
Google’s tokenization system uses a “rules + deep learning” hybrid model:
- Rule Base
- Deep Learning ModelBERT fine-tuned version for dynamic prediction of out-of-vocabulary words (like emerging terms such as “dopamine dressing”).
Practical Case:
Taking the web page content “How to brew a fragrant pour-over coffee?” as an example, the tokenization system needs to determine the correct segmentation. Possible candidate segmentations:
- How to brew a cup of fragrant pour-over coffee
- how/brew/a cup/rich and aromatic/hand-pour coffee
Data Support:
Google 2023 internal testing shows its tokenization system achieves 97.3% accuracy on common Chinese web pages, but only 89% accuracy on jargon in professional YMYL domains (like law, medicine) (due to fewer professional terminology collocation rules).
To solve this problem, Google trains additional “domain tokenization models” for vertical domain web pages (e.g., medical tokenization model will memorize correct segmentation of terms like “myocardial infarction”, “coronary artery”)).
Step 2, Entity Recognition
After tokenization, NLP needs to identify “entities” (Entity) in the text——specifically core information like people, things, time, location, events, etc.
Entities are the “skeleton” of content, helping machines quickly locate the page topic.
Technical Principle:
Google uses a Multi-Task Learning model, training entity recognition, part-of-speech tagging (like nouns, verbs), and relation extraction tasks simultaneously.
The model predicts for each token whether it belongs to an entity and labels the entity type (e.g., “TIME”, “PRODUCT”, “PERSON”).
Entity Type Examples:
| Type | Definition | Example (from webpage “2025 iPhone 15 Waterproof Test”) |
|---|---|---|
| TIME | Time point/period | “September 2025” |
| PRODUCT | Specific product | “iPhone 15”, “IP68 waterproof rating” |
| EVENT | Event/action | “Waterproof test”, “Release” |
| ATTRIBUTE | Entity’s attribute/feature | “6 meters depth”, “30 minutes” (specific waterproof parameters) |
Practical Case:
When processing the sentence “iPhone 15’s IP68 waterproof test in September 2025 shows it held up for 30 minutes at 6 meters depth”, the entity recognition system outputs:
- TIME: “September 2025”
- PRODUCT: “iPhone 15”
- ATTRIBUTE: “IP68 waterproof rating”, “6 meters depth”, “30 minutes”
- EVENT: “Waterproof test”
Data Support:
According to Google 2024 tech blog, its entity recognition model’s entity recall rate (proportion of correctly recognized entities out of all true entities) for general domain text reaches 92%, but drops to 85% in long texts (over 5000 words) (due to low entity density in long texts, model easily misses detections).
To address this, Google introduces “segmented processing” strategy: splitting long texts into paragraphs of about 500 characters, recognizing segment by segment, then merging results, improving long text entity recall rate to 90%.
Step 3, Semantic Relation
After tokenization and entity recognition, NLP needs to clarify logical relationships between words (like “belongs to”, “causes”, “attribute of”), transforming discrete tokens into structured semantic networks.
This step determines whether machines can “understand” the true meaning of sentences.
Technical Principle:
Google adopts a hybrid method of pre-trained language model + knowledge graph:
- Pre-trained models (like BERT) learn “implicit relationships” between words from massive texts (e.g., “running shoes” and “sports equipment” are in a hyponymy relationship);
- Knowledge graph (Google Knowledge Graph) provides structured knowledge (e.g., “iPhone 15″‘s brand is “Apple”, release date is “September 2023”), used to verify and supplement model-learned relationships.
Relationship Type Examples:
| Relationship Type | Definition | Example (from webpage “How to Choose Running Shoes”) |
|---|---|---|
| Hyponymy Relationship | A is a subclass of B (or vice versa) | “Running shoes” → “Sports equipment” (running shoes belong to sports equipment) |
| Attribute Relationship | A is B’s feature/parameter | “Cushioning midsole” → “Running shoes” (cushioning midsole is an attribute of running shoes) |
| Causal Relationship | A causes B | “Excessive body weight” → “Knee injury” (excessive body weight can cause knee injury) |
Practical Case:
When processing the sentence “When choosing running shoes, the cushioning midsole is key, it can reduce knee pressure”, the semantic relation system establishes:
- The attribute relationship between “running shoes” and “cushioning midsole”;
- The causal relationship between “cushioning midsole” and “reduce knee pressure”.
Data Support:
Google 2023 internal testing shows its semantic relation model achieves 88% accuracy for common relationships, but only 72% accuracy for complex relationships (like “indirect causation”). For example, in the sentence “Long-term wearing of ill-fitting shoes may cause arch deformation, which in turn leads to back pain”, the relationship between “ill-fitting shoes” and “back pain” is an indirect causal relationship, and the model easily misjudges as no direct connection. To solve this, Google introduces “chain reasoning” technology: connecting two distant entities through intermediate nodes (like “arch deformation”), improving complex relationship recognition accuracy to 85%.
Step 4, Contextual Correction
Some words have ambiguity when viewed alone (like “apple” could refer to fruit or brand), requiring correction of their semantics based on the entire paragraph or even the whole page.
This step is key to NLP “understanding” text, and is the most context-dependent stage.
Technical Principle:
Google uses a bidirectional attention mechanism (like BERT’s core design), allowing the model to simultaneously “see” both the front and back parts of a sentence, dynamically adjusting each token’s semantics.
For example, when the model processes “Xiao Ming’s apples are ripe”, “apple” initially might be fruit;
But when processing the next sentence “He plans to use Apple to release a new system”, the model will look back at the previous text, find that “release new system” has nothing to do with fruit, and thus correct “Apple”‘s semantics to “tech company”.
Practical Case:
Taking the web page content “Apple’s latest released iPhone 15 supports satellite communication, which is good news for outdoor enthusiasts” as an example:
- Looking at “Apple” alone, the model might mistakenly judge it as “fruit”;
- Combined with the next sentence “released iPhone 15”, the model will correct “Apple” to “tech company”;
- Further combined with “outdoor enthusiasts”, confirming that iPhone 15’s “satellite communication” feature is related to outdoor scenarios.
Data Support:
Google 2024 user behavior research shows, in polysemous query scenarios (like users searching “Python”), search results with contextual correction are 37% more relevant than those without correction.
Specifically for page processing, contextual correction improves ambiguous word correct semantic recognition rate from 62% to 89% (based on Google internal testing data).
NLP Saves Users 30% of Search Time Every Day
The most intuitive experience for users searching is “can I find what I want faster”.
According to Microsoft 2024 user behavior research report, using NLP-optimized search engines, average time for users to find target information shortened from 87 seconds to 59 seconds (approximately 30% reduction).
Polysemous Queries
When users search, approximately 40% of queries contain polysemous words (like “apple”, “Python”, “Java”), and traditional search engines treat these queries as single keywords, returning a large number of irrelevant results.
NLP uses semantic disambiguation technology (Word Sense Disambiguation, WSD) to determine the true meaning of words based on context, directly filtering invalid content.
Specific Manifestations:
- Case 1: Searching “Python”: Users may want programming language tutorials (62%), want to learn about snakes (18%), or query Python programming language (20%). Traditional search engines return all pages containing “Python”, and users need to manually filter 10-15 irrelevant links in the first 3 pages; with NLP intervention, the system determines user intent based on context in page content (like “print() function”, “crawler tutorials”), prioritizing programming results. Google 2023 internal testing shows, in polysemous queries, the proportion of effective results on the first screen increased from 38% to 72%, and average click count decreased from 2.3 to 1.1.
- Case 2: Searching “Java”: Users may want programming language (55%), Java Island Indonesia travel guide (25%), or coffee variety (20%). NLP analyzes related words in the page (like “JVM”, “Spring framework” correspond to programming, “Tanah Lot Temple”, “volcano” correspond to travel), quickly locking user needs. 2024 Pew Research survey shows, for polysemous queries, search completion time shortened from 112 seconds to 68 seconds (40 seconds reduction).
Technical Support:
NLP’s disambiguation capability relies on “contextual vectors” and “knowledge graph” dual verification.
For example, when users search “Java”, the model extracts other keywords from the page (like “coffee”, “programming”, “island”), maps them to entities in the knowledge graph (“Java (programming language)”, “Java (island)”), calculates the most matching entity through vector similarity (like cosine similarity), and ultimately returns corresponding results.
Implicit Needs
Users’ search terms usually express only 10%-20% of core needs, with the remaining 80%-90% being implicit (like “price”, “difficulty”, “applicable scenarios”).
NLP uses semantic expansion technology (Semantic Expansion) to extend from core words to related needs, proactively covering user unstated intents.
Specific Manifestations:
- Case 1: Searching “weight loss recipes”: Users may implicitly need “low-calorie”, “easy to make”, “suitable for office workers”, “sugar-free”, etc. Traditional search engines only match pages containing “weight loss” and “recipes”, and results may include “extreme dieting recipes” or “complex baking dishes”; with NLP intervention, the system analyzes common related words for “weight loss” (like “calories”, “low-calorie”, “quick”, “home-style”), and prioritizes displaying “15-minute low-calorie breakfast”, “meal prep for office workers” pages that better match implicit needs. Google 2022 A/B testing shows, search results covering implicit needs extended user dwell time from 45 seconds to 78 seconds (73% increase), because users don’t need to search again for “low-calorie weight loss recipes”.
- Case 2: Searching “what to wear on rainy days”: Users may implicitly need “waterproof”, “non-slip”, “lightweight”, “warm”, etc. Traditional search engines return generic results like “raincoats”, “umbrellas”; NLP recognizes the scenario attribute of “rainy days” (humid, easy to slip), associates with features like “waterproof material”, “non-slip soles”, “foldable portability”, recommending specific products like “waterproof shells”, “non-slip boots”. 2024 eMarketer survey shows, e-commerce search covering implicit needs increased conversion rate from 3.2% to 5.8% (users more likely to click to purchase).
Technical Support:
Semantic expansion relies on training from “word vector space” and “user behavior data”.
For example, Google’s BERT model maps “weight loss recipes” to a high-dimensional vector space, where vectors of words like “low-calorie”, “easy to make” are highly close to “weight loss recipes”;
At the same time, the system analyzes historical search data (like users often click “low-calorie breakfast” after searching “weight loss recipes”), further verifying the relevance of these implicit needs, ultimately generating an expanded vocabulary.
Cross-Scenario Adaptation
Users’ search scenarios (time, location, device) directly affect needs, and NLP uses context awareness technology (Context Awareness) to dynamically adjust understanding of queries, providing results more aligned with the current scenario.
Specific Manifestations:
- Time Scenario: Searching “jackets” in winter, NLP prioritizes matching keywords like “fleece-lined”, “warm”, “down jackets”; searching “jackets” in summer, prioritizes displaying “UV protection”, “lightweight”, “breathable” styles. Google 2023 seasonal search data shows, after scenario adaptation, user satisfaction with results increased from 68% to 85% (because results better match seasonal needs).
- Location Scenario: Searching “hot pot” in Shanghai, NLP recommends local popular stores like “Coucou Hot Pot”, “Zuo Ting You Yuan”; searching “hot pot” in Chengdu, prioritizes displaying authentic Sichuan-style hot pot like “Shu Da Xia”, “Xiao Long Kan”. 2024 Google Maps and Search integration testing shows, after local scenario adaptation, user probability of clicking “nearby businesses” increased from 22% to 47% (because results are more relevant).
- Device Scenario: Searching “nearby gas station” on phone, NLP prioritizes returning results like “map navigation”, “real-time gas prices”, “nearest distance” (adapted to mobile quick decision needs); searching on computer, may display “gas station list”, “user reviews”, “promotional activities” and other detailed information (adapted to desktop deep browsing needs). Microsoft 2024 multi-device research shows, after device scenario adaptation, user task completion time shortened by 42% (mobile from 90 seconds to 52 seconds, desktop from 120 seconds to 69 seconds).
Technical Support:
Context awareness relies on “metadata extraction” and “real-time data integration”.
For example, the system extracts time (through user device time), location (through IP or GPS), device type (phone/computer) from queries, and adjusts semantic weights based on real-time data (like weather, traffic, business operating status).
For instance, when searching “jackets” on a rainy day, the system obtains local rain probability in real-time, strengthening the weight of the “waterproof” attribute.
How NLP Saves Time
| Scenario Type | Traditional Search (without NLP) | NLP-Optimized Search | Time Saved | Data Source |
|---|---|---|---|---|
| Polysemous Query (Python) | 10 results on first screen, 5 irrelevant | 8 results on first screen, 7 relevant | 40 seconds | Google 2023 internal testing |
| Implicit Needs (Weight Loss Recipes) | Need to search again for “low-calorie” | Directly display low-calorie recipes on first screen | 25 seconds | Pew Research 2024 survey |
| Cross-Scenario (Searching Jackets in Summer) | Results include winter styles, need manual filtering | First screen all summer UV protection styles | 30 seconds | Microsoft 2024 multi-scenario research |
How NLP in Google Search “Reads” Page Text
Google’s NLP technology transforms page text into machine-understandable “semantic networks” through 4 steps: “tokenization → entity recognition → semantic relation → contextual correction”.
Processing over 50 billion words daily (Google 2024 data), with 97.3% tokenization accuracy, 92% entity recognition recall rate, ultimately enabling automatic differentiation between fruit or phone for “apple”, matching programming tutorials rather than snakes for “Python”——when users search related content, effective result proportion on first screen increased from 38% to 72% (2023 internal testing).
Tokenization: Cutting Text into “Smallest Units Machines Can Understand”
Simply put, this is cutting a continuous text sequence into meaningful “smallest language units” (called “tokens”).
For languages with natural space separation like English, tokenization only needs to split by spaces (e.g., “coffee mug” split into “coffee” + “mug”);
But for Chinese, Japanese and other “space-less languages”, incorrect segmentation directly causes all subsequent entity recognition and semantic understanding to fail.
Rule Base + Deep Learning
Google’s tokenization system uses a “rule base first, deep learning supplemented” hybrid model, with the core goal of segmenting text “fast and accurately”.
Rule Base:
The rule base is the “foundation” of Google’s tokenization system, with built-in common collocation patterns for major languages globally (like Chinese “brew coffee” (brew coffee), “pourover kettle” (pourover kettle), “waterproof test” (waterproof test), English “espresso machine”, “drip coffee”). These collocations come from statistical analysis of internet text——Google crawls web pages globally, calculates co-occurrence frequency of each pair of adjacent words (e.g., the probability of “brew” followed by “coffee” is 92%, probability of “brew” followed by “rice” is 85%), ultimately forming millions of “collocation dictionaries”.
For example, when processing the Chinese sentence “How to brew a fragrant pour-over coffee”, the rule base prioritizes matching high-frequency collocations like “brew/coffee”, “pour-over/coffee”, so correctly segmented as “How/to/brew/a/fragrant/pour-over/coffee”;
If encountering “Java programming”, the rule base recognizes “Java” as programming language, “programming” as action, segmented as “Java/programming” rather than “Jav/a/pro/gram/ming” (incorrect segmentation).
Deep Learning:
Although the rule base is efficient, it cannot cover all cases——the internet adds large amounts of new words daily (like “dopamine dressing”, “metaverse”) and professional terminology (like “contractual negligence” in law, “myocardial infarction” in medicine), which are not included in the rule base. At this point, Google calls on the BERT fine-tuned model for dynamic prediction.
BERT (Bidirectional Transformer) is a pre-trained language model that understands word meaning through context.
For example, when encountering “dopamine dressing”, this word is not in the rule base, but BERT predicts based on context (like “bright colors”, “cheerful mood”, “fashion”) that this is an emerging word describing a clothing style, and should be segmented as a whole “dopamine dressing”, rather than “dopa/min/e dress/ing” (incorrect segmentation).
Technical Detail Comparison:
| Technical Type | Advantages | Limitations | Applicable Scenarios |
|---|---|---|---|
| Rule Base | Fast speed (millisecond-level response) | Cannot cover emerging/professional vocabulary | Regular general text |
| BERT Fine-tuned Model | Dynamically identifies new vocabulary, professional terms | High computational cost (requires GPU) | Emerging domains, long-tail text |
Multi-Language Adaptation
Google supports tokenization for over 100 languages, but characteristics of different languages vary greatly, requiring targeted adjustments of rules and models.
Chinese: No Spaces + High Ambiguity
The difficulty of Chinese lies in “no spaces” and “polysemy”. For example, “乒乓球拍卖完了” (ping pong ball auction finished) has two segmentation methods:
- Correct: “ping pong paddle/sold out” (“ping pong paddle” is the product)
- Incorrect: “ping-pong/auction/finished” (“auction” is the action).
Google resolves ambiguity through a contextual probability modelstatistics show the co-occurrence frequency of “ping pong paddle” (ping pong paddle) as a whole (90% probability appearing on e-commerce pages) is much higher than the combination “ping pong ball + auction” (only 5% probability appearing in sports news), so prioritizes “ping pong paddle/sold out”.
Arabic: Right-to-Left Writing + Cursive:
Arabic is written right-to-left, and there are no spaces between words (e.g., “كتاب” is “book”, “قلم” is “pen”, written together as “كتابقلم”). Google’s tokenization system first reverses the text direction (to left-to-right), then uses the rule base to match boundaries of “كتاب” (book) and “قلم” (pen), ultimately segmented as “كتاب/قلم”.
Swahili: Agglutinative Language Features:
Swahili is an agglutinative language, expressing meaning by adding affixes to roots (e.g., “mtoto” is “child”, “watoto” is “children”). Google’s tokenization model identifies affix boundaries (e.g., “-o” is singular suffix, “-wa” is plural suffix), correctly segmenting “watoto” as “wa/toto” (plural + child).
Google 2023 multi-language tokenization testing shows it achieves 98% accuracy for mainstream languages like English and Spanish, but only 92% accuracy for complex languages like Arabic and Swahili.
To improve results, Google organizes “language expert teams” for each language, manually annotating 100,000+ typical sentences for training specialized tokenization models.
How Tokenization Errors Affect Search Results
Tokenization is the foundation of all subsequent NLP steps; once segmentation errors occur, it may lead to entity recognition failure and semantic relation deviation, ultimately affecting the relevance of search results. Here are two real cases:
Case 1: E-commerce Page “Java Coffee”Case 2: Legal Page “Contractual Negligence Liability”Data Support:
Google internal testing shows that tokenization errors cause target pages to drop 3-5 positions in search results (2023 A/B testing data), and user probability of clicking the page decreases by 42% (due to decreased result relevance).
Extracting Key Points from Text
When users search “2025 iPhone 15 Waterproof Test”, Google needs to quickly know the page’s core is “iPhone 15” (product), “September 2025” (time), “Waterproof Test” (event)
These key pieces of information are called “entities” (Entity).
Multi-Task Learning Model (Multi-Task Learning)
Google’s entity recognition system is based on a Multi-Task Learning model, simultaneously training “entity recognition”, “part-of-speech tagging”, and “relation extraction” three tasks, improving efficiency through shared underlying parameters.
Simply put, the model learns simultaneously:
- Which words are entities (e.g., “iPhone 15” is a product);
- These words’ grammatical roles in sentences (e.g., “iPhone 15” is a noun);
- Relationships between entities (e.g., “iPhone 15” is produced by “Apple”).
Core Technical Details:
- BERT Fine-tuning: Based on Google’s BERT pre-trained model, fine-tuned through massive annotated data (like Wikipedia, news, e-commerce pages), learning contextual features of entities. For example, in the sentence “iPhone 15 released in September 2025”, “September 2025” and “iPhone 15” are associated through BERT’s contextual vectors, and the model can determine the former is time and the latter is a product.
- Entity Type Classifier: Adding a “type classification head” at BERT’s output layer to predict each entity’s specific type (like TIME, PRODUCT, PERSON). The classifier is based on predefined 50+ entity types (covering general and vertical domains), for example:
| Entity Type | Definition | Example |
|---|---|---|
| TIME | Time point/period | “September 2025”, “30 minutes” |
| PRODUCT | Specific product | “iPhone 15”, “Pourover Kettle” |
| PERSON | Person (real or fictional) | “Tim Cook”, “Zhang Xiaolong” |
| LOCATION | Location (concrete or abstract) | “Shanghai”, “GitHub” |
| EVENT | Event/action | “Waterproof test”, “Press conference” |
| ATTRIBUTE | Entity’s attribute/feature | “IP68 waterproof rating”, “6 meters depth” |
From General to Vertical “Recognition Accuracy”
Google’s entity type system is divided into general domains (covering daily text) and vertical domains (for professional content)
General Domain Entity Types (50+):
Covering 90% of user search scenarios, for example:
- Time (TIME): Specific dates (“September 2025”), durations (“30 minutes”), time periods (“2020-2025”);
- Product (PRODUCT): Electronic devices (“iPhone 15”), home appliances (“pour-over kettle”), daily necessities (“coffee beans”);
- Location (LOCATION): Cities (“Shanghai”), countries (“USA”), organizations (“Google”).
Vertical Domain Entity Types (Industry-Specific):
For professional content in law, medicine, technology, etc., Google additionally trains domain-specific entity types, for example:
- Legal domain: Adding “legal provisions” (like “Civil Code Article 10”), “legal actions” (like “contractual negligence”);
- Medical domain: Adding “diseases” (like “myocardial infarction”), “medications” (like “aspirin”), “surgical methods” (like “PCI surgery”);
- Tech domain: Adding “algorithms” (like “BERT”), “programming languages” (like “Python”), “hardware architecture” (like “ARM”).
Data Support:
Google 2023 internal testing shows general domain entity recognition accuracy is 92%, but initial accuracy for vertical domains (like law) is only 78% (due to fewer professional terms and insufficient annotated data).
By training a separate “legal entity recognition model” for the legal domain (based on 100,000+ annotated legal texts), accuracy improved to 90%; the medical domain model through 50,000+ medical record annotations achieved 88% accuracy.
The “Four Steps” from Candidate Detection to Boundary Determination
The following uses the sentence “iPhone 15’s IP68 waterproof test in September 2025 shows it held up for 30 minutes at 6 meters depth” to break down the process:
Step 1: Candidate Detection——Finding Possible Entity “Seeds”:
The model first scans text, marking possible entity candidates based on rule base (like “year + month” is time candidate, “number + product name” is product candidate) and statistical probability (like 90% probability of number after “iPhone”):
- Candidate 1: “September 2025” (matches “year + month” rule);
- Candidate 2: “iPhone 15” (matches “product name + model” rule);
- Candidate 3: “IP68 waterproof test” (matches “technical parameter + action” rule);
- Candidate 4: “6 meters depth” (matches “number + unit + attribute” rule);
- Candidate 5: “30 minutes” (matches “number + time unit” rule).
Step 2: Type Classification——Labeling Candidates:
The model labels each candidate with a type through the Multi-Task Learning “type classification head”:
- “September 2025” → TIME;
- “iPhone 15” → PRODUCT;
- “IP68 waterproof test” → EVENT;
- “6 meters depth” → ATTRIBUTE (describing waterproof depth);
- “30 minutes” → ATTRIBUTE (describing waterproof duration).
Step 3: Boundary Determination——Correcting Entity “Start-End Positions”:
Some candidates may have boundary errors (e.g., “IP68 waterproof test” might be misjudged as “IP68” + “waterproof test”), and the model verifies boundaries through contextual vectors:
- “IP68” is the waterproof rating standard (belongs to ATTRIBUTE), but “IP68 waterproof test” as a whole is an event (EVENT), so boundary corrected to “IP68 waterproof test”;
- In “6 meters depth”, “6 meters” is the value and “depth” is the attribute, making the whole as ATTRIBUTE more reasonable.
Step 4: Global Verification——Correcting Errors Combined with Full Text:
The model generates a “global semantic vector” for the entire text (representing the overall topic, like “phone waterproof test”), and checks whether local entities conflict with the global topic. For example:
- If the text topic is “phone review”, “iPhone 15” as PRODUCT (product) matches the topic;
- If “IP68 waterproof test” as EVENT (event), consistent with “phone review” topic, no correction needed.
How Google Ensures Entity Recognition Accuracy
| Test Dimension | Initial Accuracy (2020) | Optimized Accuracy (2024) | Improvement Method |
|---|---|---|---|
| General Domain | 85% | 92% | Added 1 million annotated data, optimized BERT fine-tuning parameters |
| Long Text (>5000 words) | 78% | 90% | Introduced “segmented processing” strategy (split into 500-character segments) |
| Vertical Domain (Legal) | 78% | 90% | Trained domain-specific model (100,000+ annotated legal texts) |
| Emerging Entities (like “dopamine dressing”) | 62% | 85% | Combined BERT’s contextual prediction capability, dynamically identified new vocabulary |
User Feedback:
Google collects user search behavior data (like whether clicked pages contain target entities), inversely optimizing the model.
For example, if a user searches “iPhone 15 waterproof rating”, but the clicked page does not label “IP68” as ATTRIBUTE (attribute), the model adjusts parameters to strengthen recognition of “waterproof rating” related entities.
Establishing Relationships Between Words, Building Logic
When users search “shoes suitable for running”, Google needs to know the relationship between “running” and “shoes” (function/purpose), the relationship between “cushioning midsole” and “running shoes” (attribute), to return truly relevant results.
This ability to “establish relationships between words” is called semantic relation extraction (Semantic Relation Extraction)
Pre-trained Models and Knowledge Graph
1. Pre-trained Models: “Self-Learning” Relationships from Massive Texts:
Pre-trained models (like BERT, PaLM) are the core “learners” for semantic relations. They automatically capture implicit relationships between words by analyzing trillions of texts on the internet (like web pages, books, forums). For example:
- In sentences like “running shoes are suitable for long-distance running” and “basketball shoes are suitable for jumping”, the model learns the functional relationship between “running shoes” and “long-distance running”, “basketball shoes” and “jumping”;
- In sentences like “iPhone 15 is equipped with A17 chip” and “MacBook Pro uses M3 chip”, the model learns the “equipped with” relationship between “iPhone 15” and “A17 chip”, “MacBook Pro” and “M3 chip”.
Technical Details:
Pre-trained models represent each word’s semantics through “contextualized embeddings”.
For example, vectors of “running shoes” in different sentences change with context (like “running shoes have good cushioning” vs “running shoes have fashionable appearance”), and the model can capture these subtle differences to determine specific relationships between words.
2. Knowledge Graph: Using Structured Knowledge to “Verify + Supplement” Relationships:
Although pre-trained models can learn implicit relationships, they may have errors (like misjudging the relationship between “apple” and “fruit” as “brand”).
At this point, Google’s knowledge graph (containing over 500 million entities, 20 billion relationships) provides structured knowledge for verifying and supplementing model-learned relationships.
For example, when the model analyzes the sentence “iPhone 15’s screen supplier is Samsung”:
- The pre-trained model learns the “supplier” relationship between “iPhone 15” and “Samsung” through context;
- The structured relationship “iPhone 15 → screen supplier → Samsung” already exists in the knowledge graph, verifying the relationship is correct, and ultimately confirming the association between “iPhone 15” and “Samsung”.
The Relationship Network from Basic to Complex
Google defines over 20 types of detailed relationship types, covering 90% of user search scenarios. These relationships can be divided into three major categories:
1. Basic Relationships (General Domain):
| Relationship Type | Definition | Example (from webpage “How to Choose Running Shoes”) |
|---|---|---|
| Hyponymy Relationship | A is a subclass of B (or vice versa) | “Running shoes” → “Sports equipment” (running shoes belong to sports equipment) |
| Attribute Relationship | A is B’s feature/parameter | “Cushioning midsole” → “Running shoes” (cushioning midsole is an attribute of running shoes) |
| Functional Purpose | A is used for B | “Pour-over kettle” → “Brew coffee” (pour-over kettle is used for brewing coffee) |
| Time Sequence | A happens before/after B | “Release” → “Launch” (product is released before launching) |
2. Complex Relationships (Vertical Domain):
For professional content in law, medicine, technology, etc., Google adds more granular relationship types:
- Legal Domain: “contractual negligence liability” → “violation of good faith principle” (causal relationship); “Civil Code Article 10” → “marriage validity” (scope of application relationship).
- Medical Domain: “myocardial infarction” → “coronary artery blockage” (cause of disease relationship); “aspirin” → “inhibit platelet aggregation” (pharmacological action relationship).
- Technology Domain: “Python” → “crawler tutorial” (application domain relationship); “ARM architecture” → “low power consumption” (technical characteristic relationship).
The “Five Steps” from Candidate Relation Mining to Global Verification
The following uses the sentence “When choosing running shoes, the cushioning midsole is key, it can reduce knee pressure” to break down the process:
Step 1: Candidate Relation Mining——Finding Possible “Relation Seeds”:
The model first scans text, marking possible candidate relationships based on rule base (like “X is Y’s key” may imply “functional purpose” relationship) and statistical probability (like 90% co-occurrence probability of “cushioning midsole” and “running shoes”):
- Candidate 1: “Running shoes” and “cushioning midsole” (possible attribute relationship);
- Candidate 2: “cushioning midsole” and “reduce knee pressure” (possible functional purpose relationship).
Step 2: Relation Type Classification——Labeling Candidates:
The model labels each candidate with a relationship type through the pre-trained model’s “relation classification head”:
- “Running shoes” and “cushioning midsole” → attribute relationship (cushioning midsole is an attribute of running shoes);
- “Cushioning midsole” and “reduce knee pressure” → functional purpose relationship (cushioning midsole is used to reduce knee pressure).
Step 3: Boundary Determination——Correcting the “Scope of Action” of Relationships:
Some candidates may have boundary errors (e.g., “cushioning midsole” might be misjudged as a component of “running shoes” rather than an attribute), and the model verifies boundaries through contextual vectors:
- “Cushioning midsole” describes the “material/structural feature” of running shoes, belonging to attribute rather than component (components like “sole”, “upper”), so corrected to attribute relationship.
Step 4: Global Verification——Correcting Errors Combined with Full Text:
The model generates a “global semantic vector” for the entire text (representing the overall topic, like “Running Shoes Buying Guide”), and checks whether local relationships conflict with the global topic. For example:
- If the text topic is “Running shoes buying guide”, the functional purpose relationship between “cushioning midsole” and “reduce knee pressure” matches the topic;
- If the text topic is “Sports injury prevention”, then the relationship needs to be re-evaluated for relevance to “injury prevention”.
Step 5: Knowledge Graph Verification——Using Structured Knowledge as “Bottom Line”:
The model calls the knowledge graph to verify the rationality of relationships:
- In the knowledge graph, attributes of “running shoes” include “cushioning midsole”, “weight”, “sole material”, confirming “cushioning midsole” is a legitimate attribute of running shoes;
- In the knowledge graph, functions of “cushioning midsole” include “reduce knee pressure”, “improve comfort”, confirming “reduce knee pressure” is its legitimate function.
How Google Ensures Semantic Relation Accuracy
| Test Dimension | Initial Accuracy (2020) | Optimized Accuracy (2024) | Improvement Method |
|---|---|---|---|
| Common Relationships (hyponymy, attribute) | 78% | 88% | Added 2 million annotated data, optimized BERT fine-tuning parameters |
| Complex Relationships (causal, functional purpose) | 65% | 82% | Introduced “chain reasoning” technology (connecting distant entities through intermediate nodes) |
| Vertical Domain (Medical) | 60% | 79% | Trained domain-specific model (50,000+ annotated medical texts) |
| Emerging Relationships (like “AI large model → multimodal”) | 52% | 75% | Combined pre-trained model’s contextual prediction capability, dynamically identified new relationships |
Correcting Word Semantic Bias Combined with Full Text
When users search “Python tutorial”, Google needs to determine whether “Python” in the page is the programming language (62%) or snake (18%);
When users search “Apple press conference”, it needs to confirm “Apple” is a tech company (95%) rather than fruit (5%).
This ability to “correct word semantic bias combined with full text” is called contextual disambiguation (Contextual Disambiguation)
Bidirectional Attention and Global Semantics
1. Semantic Capture “Looking Both Ways” Simultaneously:
The bidirectional attention mechanism (like BERT’s core design) allows the model to simultaneously analyze both the front and back parts of a sentence, capturing “cause and effect” relationships between words.
For example, when processing the sentence “Xiao Ming’s apples are ripe”, the model first focuses on “Xiao Ming” and “ripe”, initially judging “apple” might be fruit;
But when processing the next sentence “He plans to use Apple to release a new system”, the model will look back at the previous text, find that “release new system” has nothing to do with fruit, and thus correct “Apple”‘s semantics to “tech company”.
Technical Details:
Bidirectional attention is implemented through “Query-Key-Value” matrices:
- Query: Semantic vector of the current word;
- Key: Semantic vectors of other words;
- Value: Semantic vectors of other words (weighted by attention weights).
The model calculates the similarity between “Query” and “Key”, assigning “attention weights” to each word. The higher the weight, the greater the semantic influence of that word on the current word.
For example, the attention weight between “release new system” and “Apple” is as high as 0.8 (maximum is 1), much higher than 0.2 between “ripe” and “Apple”, so the model prioritizes referring to “release new system” to correct “Apple”‘s semantics.
2. “Topic Anchor” of the Whole Page Content:
In addition to local sentence context, Google also generates a “global semantic vector” (Global Semantic Vector) for the entire page content, representing the overall topic of the page (like “Tech Product Review”, “Weight Loss Recipes”).
When the semantics of local words conflict with the global topic, the model prioritizes correcting to the meaning that matches the topic.
For example, when processing a page titled “2025 iPhone 15 Waterproof Test”:
- In the local sentence “Apple’s latest released iPhone 15 supports satellite communication”, “Apple” initially might be “fruit”;
- But the global semantic vector shows the page topic is “phone review”, so the model corrects “Apple” to “tech company”.
The “Four Steps” from Local Ambiguity to Global Consistency
The following uses the web page content “Apple’s latest released iPhone 15 supports satellite communication, which is good news for outdoor enthusiasts” to break down the process:
Step 1: Local Ambiguity Detection——Marking “Suspicious” Words:
The model first scans the full text, identifying words that may have ambiguity (polysemous words, pronouns, etc.). In this example, “Apple” is a typical polysemous word (fruit/tech company), and “it” is a pronoun (need to clarify the referent).
Step 2: Local Context Analysis——Extracting “Candidate Semantics”:
For each “suspicious” word, the model analyzes its local context (previous or following 1-3 sentences), extracting possible candidate semantics:
- “Apple”‘s candidate semantics:
- Candidate 1: fruit (based on common collocations with words like “ripe”, “eat”);
- Candidate 2: tech company (based on common collocations with words like “released iPhone 15”, “satellite communication”).
- “It”‘s candidate semantics:
- Candidate 1: iPhone 15 (refers to “iPhone 15” in the previous sentence);
- Candidate 2: satellite communication (refers to “satellite communication function” in the previous sentence).
Step 3: Global Semantic Verification——Matching Page Topic:
The model generates a “global semantic vector” for the entire page (encoded by BERT for the full text), calculates similarity with candidate semantic vectors, and selects the semantics that best match the global topic:
- The page title and body text frequently contain words like “iPhone 15”, “satellite communication”, “outdoor enthusiasts”, and the global semantic vector points to “Tech Product Review”;
- Among “Apple”‘s candidate semantics, similarity between “tech company” and global topic (cosine similarity 0.85) is much higher than “fruit” (0.12), so “tech company” is prioritized;
- Among “It”‘s candidate semantics, similarity between “iPhone 15” and global topic (0.9) is much higher than “satellite communication” (0.6), so corrected to “iPhone 15”.
Step 4: Conflict Resolution——Handling Contradictions from Multiple Sources:
If local context conflicts with global topic (like “Apple” in one sentence refers to fruit, but the full text topic is technology), the model further analyzes the cause of the conflict:
- If it’s a “typo” (like “Apple” should be “strawberry”), the model preserves the global semantics;
- If it’s “polysemy coexistence” (like the page discusses both “apple fruit” and “Apple company”), the model generates “semantic layers”, prioritizing display of the meaning relevant to user queries.
How Google Ensures Contextual Correction Accuracy
| Test Dimension | Initial Accuracy (2020) | Optimized Accuracy (2024) | Improvement Method |
|---|---|---|---|
| Polysemous Query (Python) | 58% | 82% | Introduced BERT bidirectional attention mechanism, added 1 million annotated polysemous texts |
| Pronoun Correction (“it”) | 65% | 89% | Trained “coreference resolution model” (based on 100,000+ annotated coreference sentences) |
| Long Text (>5000 words) | 52% | 78% | Introduced “segmented global vectors” (generating local global vectors every 500 characters) |
| Cross-Language Correction (English → Chinese) | 48% | 75% | Combined multilingual BERT model, added 500,000 cross-language alignment annotations |
How NLP Determines What Users Want
Google’s NLP technology determines users’ true needs by analyzing “intent types” (informational/navigational/transactional), “semantic expansion” (implicit needs), and “scenario adaptation” (time/location/device) of user search queries.
Google processes over 8.5 billion searches daily (2024 data), informational query CTR (click-through rate) increased from 12% to 28% (after introducing NLP), and polysemous query accuracy increased from 58% to 82% (BERT model optimization).
Intent Types
1. Informational Needs: Users Want to “Learn Knowledge”:
Characteristic words: “how to”, “principle”, “reason”, “tutorial”, etc.
Example: Users search “how to brew pour-over coffee”, “causes of myocardial infarction”, and NLP matches tutorial and popular science pages.
Data support: Google 2023 internal testing shows informational query effective result proportion on first screen increased from 38% to 72% (by identifying keywords like “how to”).
2. Navigational Needs: Users Want to “Find a Specific Website”:
Characteristic words: “official website”, “official”, “login”, “register”, etc.
Example: Users search “Taobao official website”, “Apple ID login”, and NLP directly points to official websites rather than third-party pages.
Data support: Microsoft 2024 research shows navigational query user probability of clicking target website increased from 45% to 89% (NLP precisely identifies words like “official website”).
3. Transactional Needs: Users Want to “Buy Things/Services”:
Characteristic words: “recommendation”, “budget-friendly”, “discount”, “buy”, etc.
Example: Users search “budget mechanical keyboard recommendation”, “nearby gas station”, and NLP prioritizes displaying e-commerce pages or local businesses.
Data support: eMarketer 2024 survey shows transactional query conversion rate increased from 3.2% to 5.8% (NLP covers implicit needs like “recommendation”, “discount”).
Intent Type Comparison Table:
| Type | Example Keywords | User Goal | NLP Matching Strategy |
|---|---|---|---|
| Informational | How to, principle, tutorial | Gain knowledge | Match tutorial/popular science pages |
| Navigational | Official website, official, login | Visit specific website | Directly point to official website |
| Transactional | Recommendation, budget-friendly, discount, buy | Purchase goods/services | Prioritize display of e-commerce/local business pages |
Semantic Expansion
User search terms usually express only 10%-20% of core needs, with the remaining 80%-90% being implicit (like “price”, “difficulty”, “applicable scenarios”).
NLP uses semantic expansion technology (Semantic Expansion) to extend from core words to related needs, proactively covering user unstated intents.
Expansion Method 1: Related Word Expansion:
NLP associates core words with semantically similar words based on “word embedding space” (Word Embedding). For example:
- Core word “weight loss recipes” → related words “low-calorie”, “easy to make”, “suitable for office workers”, “sugar-free”;
- Core word “what to wear on rainy days” → related words “waterproof”, “non-slip”, “lightweight”, “warm”.
Data support: Google 2022 A/B testing shows search results covering implicit needs extended user dwell time from 45 seconds to 78 seconds (73% increase).
Expansion Method 2: Scenario-Based Expansion:
NLP further refines needs by combining search time, location, and device. For example:
- Time scenario: searching “jackets” in winter → expanded to “fleece-lined”, “warm”; searching “jackets” in summer → expanded to “UV protection”, “lightweight”;
- Location scenario: searching “hot pot” in Shanghai → expanded to “local popular”; searching “hot pot” in Chengdu → expanded to “authentic Sichuan-style”;
- Device scenario: searching “nearby gas station” on phone → expanded to “real-time gas prices”, “nearest distance”; searching on computer → expanded to “user reviews”, “promotional activities”.
Data support: Microsoft 2024 multi-scenario research shows scenario-based expansion shortened user task completion time by 42% (mobile from 90 seconds to 52 seconds).
How NLP “Understands” User Needs
1. Natural Language Understanding (NLU):
NLU is the foundation of NLP, collectively “disassembling” user queries through tokenization, entity recognition, and semantic relation. For example:
- User searches “2025 iPhone 15 waterproof test” → tokenized as “2025 model/iPhone 15/waterproof test”
- Entity recognized as “TIME (2025)”, “PRODUCT (iPhone 15)”, “EVENT (waterproof test)”;
- Semantic relation merged into “2025 iPhone 15’s waterproof performance test”.
Data support: Google 2023 tech blog shows NLU achieves 92% accuracy for complex query disassembly (general domain).
2. Deep Learning Models (like BERT):
BERT and other pre-trained models learn “contextual semantics” through trillions of texts, solving ambiguity problems. For example:
- User searches “Python” → BERT analyzes context (like “print() function”, “crawler tutorial”) → judged as programming language;
- User searches “Java” → BERT combines related words like “coffee”, “programming” → judged as programming language (62%) or island (18%).
Data support: Google 2024 internal testing shows BERT improved polysemous query accuracy from 58% to 82%.
3. Real-Time Scenario Data Integration:
NLP integrates user device time, geographic location, search history and other real-time data, dynamically adjusting need judgments. For example:
- User searches “nearby gas station” on phone → NLP obtains GPS location → prioritizes displaying gas stations within 3 kilometers;
- User searches “movie tickets” on weekend → NLP combines time (weekend) → recommends showings at popular cinemas.
Data support: Pew Research 2024 survey shows after integrating real-time scenario data, user satisfaction with search results increased from 68% to 85%.
Real Effects
Here are user behavior data for three typical scenarios:
| Scenario Type | Traditional Search (without NLP) | NLP-Optimized Search | Effect Improvement | Data Source |
|---|---|---|---|---|
| Informational Query (How to make cake) | First screen mixed with ads and irrelevant tutorials | First screen directly displays step-by-step tutorials | Dwell time from 45s → 78s (+73%) | Google 2022 A/B testing |
| Navigational Query (Taobao official website) | First screen includes third-party shopping platforms | First screen only displays Taobao official website | Click target website probability from 45% → 89% | Microsoft 2024 research |
| Transactional Query (Budget mechanical keyboard) | First screen mixed with high-priced products | First screen prioritizes displaying cost-effective models | Conversion rate from 3.2% → 5.8% (+81%) | eMarketer 2024 survey |
Finally, I want to say that the core of NLP judging user needs is transforming “words users input” into “users’ true intent”.



