Beyond English: Architecting Search for a Global World
In a world where over half of all web content is non-English, most search systems are still built on a flawed, English-centric foundation. This "monolingual trap" leads to catastrophic failures in global markets, frustrating users and costing businesses dearly. Beyond English is the definitive architect's guide to escaping this trap. This book provides a comprehensive framework for designing and implementing search systems that are not just translated, but are linguistically and culturally fluent. It moves from universal principles—like language detection, indexing pipelines, and query understanding—to deep, practical dives into the world's major language families. The Book's Organizational Framework Universal Challenges The book opens by establishing the universal challenges of multilingual search, introducing the "monolingual trap" — the flawed assumption that all users search like English speakers. It presents striking data showing that over half of web content is non-English and clarifies the distinctions between monolingual, multilingual, and cross-lingual search to build a shared conceptual foundation. The Core Technical Foundation Next, the book outlines the "Core Components of a Multilingual Search System," forming the technical backbone for later chapters. It covers language detection for both queries and documents, then explores the indexing pipeline — tokenization, normalization, stemming, and lemmatization — explaining how each must be adapted to different languages. Further sections address query processing (understanding intent across languages), ranking and relevance (adapting scoring to linguistic variation), and cross-lingual methods such as translation, embeddings, and semantic matching. It concludes with evaluation metrics tailored to multilingual systems. The Language Family Chapters The main body of the book consists of language family–based chapters, each following a consistent structure that highlights distinct linguistic and engineering challenges. Latin-Based Languages: Explores diacritics, elisions, compound words, and morphological richness across languages like German, French, Spanish, Portuguese, Italian, Catalan, Turkish, and the Scandinavian group. Each subsection identifies unique issues and offers targeted implementation guidance. Slavic and Cyrillic Languages: Examines morphological complexity in Russian, Ukrainian, Polish, Bulgarian, and Serbian, addressing script duality and the cultural sensitivities tied to language use and search behavior. East Asian (CJK) Languages: Covers Chinese, Japanese, and Korean, focusing on segmentation and script diversity. Chinese requires precise word boundaries, Japanese manages multiple scripts, and Korean combines phonetic writing with agglutinative morphology. Indic and Thai Scripts: Discusses abugida systems like Hindi, Bengali, Tamil, Thai, and Vietnamese, tackling challenges such as conjunct characters, tonal marks, and script-driven word segmentation. Middle Eastern and Right-to-Left Languages: Explores Arabic, Hebrew, Persian, and Urdu, addressing bidirectional rendering and the complex root-and-pattern morphology that produces extensive word families. African and Emerging Languages: Highlights under-resourced languages like Swahili, Amharic, Yoruba, and Hausa, presenting innovative strategies using transfer learning and community-driven data initiatives. This framework creates a balance between universal principles and language-specific insights, guiding readers from core concepts to the cutting edge of multilingual search.
-
Autore:
-
Anno edizione:2025
-
Editore:
-
Formato:
-
Lingua:Inglese
Formato:
Gli eBook venduti da Feltrinelli.it sono in formato ePub e possono essere protetti da Adobe DRM. In caso di download di un file protetto da DRM si otterrà un file in formato .acs, (Adobe Content Server Message), che dovrà essere aperto tramite Adobe Digital Editions e autorizzato tramite un account Adobe, prima di poter essere letto su pc o trasferito su dispositivi compatibili.
Cloud:
Gli eBook venduti da Feltrinelli.it sono sincronizzati automaticamente su tutti i client di lettura Kobo successivamente all’acquisto. Grazie al Cloud Kobo i progressi di lettura, le note, le evidenziazioni vengono salvati e sincronizzati automaticamente su tutti i dispositivi e le APP di lettura Kobo utilizzati per la lettura.
Clicca qui per sapere come scaricare gli ebook utilizzando un pc con sistema operativo Windows