Modern Methods Of Determining Relevance In Web Search Systems
Keywords:
relevance, web search, TF-IDF, BM25, BERT, semantic search, Learning to Rank, dense retrieval, information retrieval, neural network.Abstract
This article examines modern methods for relevance determination in web search engines. The analysis covers classical approaches such as TF-IDF and BM25, as well as neural network-based techniques including BERT, dense retrieval, and multi-vector models. Semantic search, Learning to Rank (LTR) technologies, user behavioral signals, and multimodal search systems are discussed. The effectiveness of methods is evaluated based on the practical experience of major systems — Google, Bing, and Elasticsearch.
References
Manning C.D., Raghavan P., Schütze H. Introduction to Information Retrieval. — Cambridge University Press, 2008. — 482 b.
Salton G., Wong A., Yang C.S. A vector space model for automatic indexing // Communications of the ACM. — 1975. — Vol. 18, №11. — P. 613–620.
Robertson S., Zaragoza H. The Probabilistic Relevance Framework: BM25 and Beyond // Foundations and Trends in Information Retrieval. — 2009. — Vol. 3, №4.
— P. 333–389.
Devlin J., Chang M., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proceedings of NAACL-HLT 2019. —
P. 4171–4186.
Karpukhin V. et al. Dense Passage Retrieval for Open-Domain Question Answering // Proceedings of EMNLP 2020. — P. 6769–6781.
Liu T.-Y. Learning to Rank for Information Retrieval. — Springer, 2011. — 123 b.
Radford A. et al. Learning Transferable Visual Models From Natural Language Supervision (CLIP) // Proceedings of ICML 2021. — P. 8748–8763.
Khattab O., Zaharia M. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT // Proceedings of SIGIR 2020. — P. 39–48.
Reimers N., Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks // Proceedings of EMNLP 2019. — P. 3982–3992.
Baeza-Yates R., Ribeiro-Neto B. Modern Information Retrieval: The Concepts and Technology behind Search. 2nd ed. — Addison-Wesley, 2011. — 913 b.
Mikolov T. et al. Distributed Representations of Words and Phrases and their Compositionality // Advances in Neural Information Processing Systems (NIPS). — 2013. — P. 3111–3119.
Joachims T. Optimizing Search Engines Using Clickthrough Data // Proceedings of KDD 2002. — P. 133–142.
Yusupov A.A., Mirzayev B.T. O'zbek tilidagi axborot qidirish tizimlarini yaratish muammolari // O'zbekiston Milliy universiteti xabarlari. — 2021. — №3. — B. 45–52.
Karimov O.B. Matnlarni tahlil qilishda sun'iy intellekt usullari. — Toshkent: Fan va texnologiya, 2022. — 180 b.
Elasticsearch Documentation. BM25 Similarity. — https://www.elastic.co/guide/en/elasticsearch/reference (murojaat sanasi: 2024-yil).
77.