Киселёва Санкт-Петербургский государственный университет Поступила в редакцию 27.09.2010 Аннотация: Основным результатом данной работы является автоматический алгоритм для сегментации запросов пользователей, который преобразует запрос с ключевыми словами в структурированный запрос, используя при этом журнал щелчков пользователей и информацию из словарей. <...> Annotation: We describe results of experiments with an unsupervised framework for query segmentation, transforming keyword queries into structured queries. <...> The resulting queries can be used to more accurately search product databases, and potentially improve result presentation and query suggestion. <...> The key to developing an accurate and scalable system for this task is to train a query segmentation or attribute detection system over labeled data, which can be acquired automatically from query and click-through logs. <...> The main contribution of our work is a improving method to automatically acquire such training data – resulting in significantly higher segmentation performance, compared to previously reported methods. <...> INTRODUCTION This work focuses on the problem of detecting and labeling product attribute values in keyword queries to enable structured querying of product databases, more effective ranking and filtering of the results, and potentially improving the result presentation. <...> The main contribution of this work is an improved compare to [4] unsupervised approach to this problem that trains the extraction/ segmentation system based on only the product click data. <...> The key idea is to automatically and robustly align the query terms to attribute terms via click data, resolve ambiguities using frequency and similarity statistics, and then use the resulting automatically generated alignments to train a text segmentation of information extraction system. <...> This improved unsupervised approach has multiple advantages over the previous supervised and semi-supervised methods [1]: © Киселёва Ю. <...> RELATED WORK There are a good number about semi-supervised or unsupervised methods for Conditional Random Fields (CRF) have been published in recent years. <...> This works used additional resources for semi-or un-supervised information extraction. <...> For example in [2], a database was used to create an artificially-annotated training data to ВЕСТНИК ВГУ, СЕРИЯ: СИСТЕМНЫЙ АНАЛИЗ И ИНФОРМАЦИОННЫЕ ТЕХНОЛОГИИ, 2010, № 2 Автоматическая сегментация запросов пользователей <...>