:::

詳目顯示

回上一頁
題名:以使用記錄分析探索網路使用者檢索興趣之研究
作者:卜小蝶 引用關係
作者(外文):Hsiao-Tieh Pu
校院名稱:國立交通大學
系所名稱:資訊管理所
指導教授:楊千
學位類別:博士
出版日期:2002
主題關鍵詞:檢索記錄分析網路使用者研究網路資訊檢索圖書借閱記錄分析關聯分析使用者導向分類機制Query Log AnalysisNetwork User StudiesNetwork Information RetrievalCirculation History AnalysisAssociation AnalysisUser-oriented Classification Scheme
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(1) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:1
  • 共同引用共同引用:0
  • 點閱點閱:24
隨著網際網路的日漸普及,利用搜尋引擎查詢資訊已成為網路上最重要的活動之一,而瞭解這些網路使用者的檢索行為也成為各項研究的重用基礎。使用記錄(Logs)對於瞭解使用者是相當重要的來源,例如檢索記錄(Query Logs)包含了檢索詞彙及檢索過程等資訊,是分析使用者需求的重要線索。因此,本研究主要目的在設計多種以使用記錄為基礎的方法,藉以發展一整合性架構來有效觀察分析網路使用者的檢索興趣,並進一步作為瞭解使用者資訊需求及提昇網路檢索系統之參考。
本研究所提出的整體架構主要包括三項範疇,分別為將搜尋引擎中的檢索詞彙記錄以事先製定(Predefined)之主題範疇(Category)進行分類、建構一適合組織檢索詞彙的分類架構、及探索階層式分類架構中主題類別間的關聯等。實驗資料包括在不同時期所收集的三種搜尋引擎超過五百萬筆的檢索記錄。本研究首先提出一整合人工分析及電腦自動的主題分類方法,能有效處理大量檢索詞彙的分類工作,而各項主題範疇則分別代表某一類的檢索興趣;同時,本研究所使用的分類架構,是根據熱門檢索需求.透過一系統化方法所建構;此外,由於所設計的主題分類屬於階層式架構,不同主題範疇或檢索興趣間的關聯(Association),是透過分析相似使用行為的使用者,以協力式方法求得。
研究結果主要分為三方面:第一部分是有關網路使用者的檢索興趣分析及觀觀,分析結果如初步瞭解台灣地區使用者具有檢索詞彙簡短、存在核心詞彙、及檢索專有名詞比例高等,而檢索興趣的分析則包括如即時觀察熱門檢索主題類別的分佈情形、及其在不同時期的變化等。第二部分則是有關建構一適合組織檢索詞彙的分類架構,本研究初步建構了一包含15大類、100小類的分類架構,收集了近二萬個已分類的主題檢索詞彙,並分析各類詞彙的特性如重要的查詢主題、檢索行為模式、及資訊需求類型等。第三部分則嘗試以使用者角度來瞭解階層式分類系統中各主題類別的關聯,初步透過圖書分類系統中相似圖書借閱行為的分析,挖掘出一些重要的非階層性關聯,並探討這些關聯的意義與類型等。
實驗結果顯示,透過本研究所提出的架構與方法,可有效且即時地觀察網路使用者檢索興趣的分佈與變化,同時也可以系統化方式建立以檢索興趣為導向的主題分類表,此外,藉由相似使用行為的分析,則可獲得許多非階層式的關聯,讓主題分類的設計能更符合變動的使用者需求,及提供檢索興趣間多重聯結的彈性。有鑑於網路使用行為的研究在國外已受到相當重視,而本研究則是國內首次利用大量檢索詞彙進行台灣地區網路使用者行為的研究,所獲得成果對於瞭解網路使用者資訊需求,與改善網路搜尋系統檢索效益都具有相當的應用價值。除此,也可提供相關領域如傳播、教育、或電子商務等領域進行深入探討。
The Web is a revolution in information access. The searching is by far the most common user activity on the Web, yet many users experience great frustration while searching. In order to fulfill the intent of search, it is crucial to learn more about what users search on the Web. This proposal, therefore, presents an integrated framework of studying Web search interests through using various log-based approaches. The purpose is to develop effective methods to organize and understand search interests in terms of users’ queries on the Web. The framework consists of three main tasks, including subject categorization of query terms from search engines, construction of hierarchical subject taxonomy covering popular search interests, and discovery of associations between search interests in terms of the categories in the taxonomy.
Using logs containing over 5 million queries from three search engines in Taiwan, the study proposed feasible and systematic methods to study Web search interests on a larger scale. Such methods contain development of an auto-categorization approach to classifying query terms into predefined taxonomy, design of a systematic approach to constructing a user-oriented subject taxonomy, and use of collaborative methods to discovering associated categories in the hierarchical taxonomy. For current stage of the research, there have been some initial results obtained, such as the frequency distributions of subject categories in response to changes in users’ search interests can be systematically observed in real time, a 2-level subject taxonomy of 15 major and 100 subcategories has been constructed based on grounded analysis of popular queries, and many highly associated categories across different subject hierarchies of the taxonomy have been discovered from analyzing transaction patterns of similar users. Some ongoing topics of research are also described in the proposal, including evaluation of different feature sets for the auto-categorization approach, design of query terms clustering to assist in the construction of the taxonomy, and investigation of association types of the categories obtained in the hierarchical taxonomy.
The experimental results show that the framework can serve as a ground research and proves beneficial for related Web studies. Implications for applications are various, mainly in three areas of Web information retrieval research: (1) it is valuable for use in the design of Web information retrieval systems, such as implementing query filters; (2) it is useful for Web content organization, such as collecting domain-specific vocabularies; and (3) it provides an alternative way to understand users' searching behaviors, such as facilitating Web user studies.
Abrams, M., & Williams, S. (1996). Complementing surveying and demographics with automated network monitoring. WWW Journal, 3. Available: http://www.w3j.com/3/s3.abrams.html
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York: ACM.
Bao, X.-M. (1998). Changes and opportunities: A report of the 1998 library survey of Internet users at Seton Hall University. College & Research Libraries, 59, 535-43.
Barr, A., & Sichel, H.S. (1991). A bivariate model to predict library circulation. Journal of the American Society for Information Science, 42(8), 546-53.
Beeferman, D., & Berger, A. (2000). Agglomerative clustering of a search engine query log. Proceedings of ACM SIGKDD International Conference, 407-15.
Belew, R.K., & Van Rijsbergen, C.J. (2001). Finding out about: A cognitive perspective on search engine technology and the WWW. Cambridge Univ. Press.
Bertland, L.H. (1991). Circulation analysis as a tool for collection development. School Library Media Quarterly, 19(2), 90-7.
Buckland, M., et al. (1999). Mapping entry vocabulary to unfamiliar metadata vocabularies. D-Lib Magazine. Available: http://www.dlib.org/dlib/january99/buckland/01buckland.html
Carlyle, A. (1989). Matching LCSH and user vocabulary in the library catalog. Cataloging & Classification Quarterly, 10(1/2), 37-63.new window
Catalano, C. (1999). Proxy servers. COMPUTERWORLD, (Nov. 22, 1999), 67.
Catledge, L.D., & Pitkow, J.E. (1995). Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems, 27, 1065-73.
Chan, L. M. (1994). Cataloging and classification: An Introduction. 2nd ed. New York: McGraw-Hill.
Chan, L.M. (1995). Classification, present and future. Cataloging and classification quarterly, 21(2), 5-17.
Chang, S.-H. (1999). The current state of Web search engines. OCLC Systems and Services, 15, 148-9.
Chen, M.-S., Han, J., and Yu, P.S. (1996). Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8, 866-83.
Chien, L.-F., & Pu, H.-T. (1996). Important issues on Chinese information retrieval. Computational Linguistics and Chinese Language Processing, 1(1), 205-21.new window
Choo, C.W., Detlor, B., & Turnbull, D. (1999). Information seeking on the Web: An integrated model of browsing and searching. Proceedings of the 62nd ASIS Annual Meeting. Medford, NJ: Information Today, 3-16.
Chuang, S.-H., Pu, H.-T., Lu, W.-H., & Chien, L.-F. (2000). Auto-construction of a live thesaurus from search term logs for interactive Web search. (poster). ACM SIGIR 2000. Athens, Greece.
Cookie Central (1999). The unofficial cookie FAQ. Available: http://www.cookiecentral.com/
Covey, D.T. (2002). Usage and usability assessment: Library practices and concerns. Digital Library Federation, Council on Library and Information Resources. http://www.clir.org/pubs/reports/pub105/contents.html
Cunningham, S.J., & Frank, E. (1999). Market basket analysis of library circulation data. Proceedings of the 6th International Conference on Neural Information Processing, Perth, Western Australia, 825-30.
Cyveillance, Inc. (2000) Sizing the Internet. July 10, 2000. Available: http://www.cyveillance.com/newsroom/3012.asp.
Day, M., & Revill, D. (1995). Towards the active collection: The use of circulation analyses in collection evaluation. Journal of Librarianship and Information Science, 27(3), 149-57.
Ding, W., & Marchionini, G. (1996). A comparative study of Web search service performance. Proceedings of the 59th ASIS Annual Meeting. Medford, NJ: Information Today, 136-42.
Drabenstott, K. M., & Vizine-Goetz, D. (1994). Using subject headings for online retrieval. San Diego, CA: Academic Press.
Drabenstott, K. M., & Weller, M. S. (1996). Failure analysis of subject searches in a test of a new design for subject access to online catalogs. Journal of the American Society for Information Science, 47(7), 519-37.
Eldredge, J.D. (1998). The vital few meet the trivial many: Unexpected use patterns in a monographs collection. Bulletin of the Medical Library Association, 86(4), 496-503.
Goller, C., Loning, J., Will, T., & Wolff, W. (2000). Automatic document classification: A thorough evaluation of various methods. IEEE Intelligent Systems, 14(1), 75-7.new window
GVU Center, College of Computing, Georgia Institute of Technology (1998). GVU WWW user survey. Available: http://www.cc.gatech.edu/gvu/user_surveys/
He, S. (1999). Chinese search engines for retrieving Chinese information on the Internet: search capabilities, retrieval performances and evaluation criteria. In Ching-chih Chen (Ed.), IT and global digital library development. West Newton, MA.: MicroUse Information, 171-82.
Hickey, T.B., & Vizine-Goetz, D. (2000). The role of classification in CORC. The 23rd International Online Information Meeting, edited by B. McKenna. Oxford: Learned Information Europe, 247-50.
Hildreth, C. R. (1985). Monitoring and analyzing online catalog user activity. LS/2000 Communique, 3-6.
Hoelscher, C. (1998). How Internet experts search for information on the Web. WebNet98 - World Conference of the WWW, Internet & Intranet. Orlando, FL.
Hoelscher, C., & Strube, G. (2000). Web search behavior of Internet experts and newbies. Proceedings of the 9th World Wide Web Conference, 337-46.
Hooper, R.S. (1965). Indexer consistency tests: Origin, measurements, results and utilization. Bethesda, M.D.: IBM.
Hsieh-Yee, I. (2001). Research on Web search behavior. Library & Information Science Research, 23, 167-85.
Hu, W.-C., Chen, Y., Schmalz, M.S., & Ritter, G. (2001). An overview of World Wide Web search technologies. Proceedings of the 5th World Multi-Conference on Systemics, Cybernetics and Informatics (SCI 2001), Orlando, Florida, 356-61.
I-Search (2000). Search satisfaction survey. Available: http://www.searchenginewatch.com/sereport/00/12-isearch.html
ISOC.TW (1999). Internet Society Of Taiwan. Taiwan Internet survey report. Available: http://www.find.org.tw/0105/trend/0105_trend_disp.asp?trend_id=1002 (in Chinese)
Jansen, B. J., & Pooch, U. (2001). A review of web searching studies and a framework for future research. Journal of the American Society for Information Science, 52(3), 235-46.
Jansen, B. J., Spink, A., & Pfaff, A. (2000). Linguistic aspects of Web queries. Proceedings of the American Society for Information Science 2000. Chicago, IL.
Jansen, B. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: A study and analysis of user queries on the web. Information Processing and Management, 36(2), 207-27.
Jing, Y. & Croft, W.B. (1994). An association thesaurus for information retrieval. Technical Report 94-17, University of Massachusetts.
Jones, S., Gatford, M., & Do, T. (1997). Transaction logging. Journal of Documentation, 53, 35-50.
Jones, S., Cunningham, S.J., McNab, R., & Boddie, S. (2000). A transaction log analysis of a digital library. International Journal on Digital Libraries, 3(2), 152-169.
Karypis, G. (2001). Evaluation of item-based top-n recommendation algorithms. CIKM 2001.
Kaske, S. (1993). Research methodologies and transaction log analysis: Issues, questions, and a proposed model. Library Hi Tech, 11(2), 79-86.
Kent, A., & Williams, J. G. (1979). Use of library materials: the University of Pittsburgh Study. New York: Marcel Dekker.
Kosala, R., & Blockeel, H. (2000). Web mining research: A survey. SIGKDD Explorations - Newsletter of the ACM SIG on Knowledge Discovery and Data Mining 2(1), 1-15. Available: http://www.acm.org/sigs/sigkdd/explorations/issue2-1/contents.htm.new window
Kurth, M. (1993). The limits and limitations of transaction log analysis. Library Hi Tech, 11(2), 98-104.
Lai, Y.-H. (1989). New classification scheme for Chinese librarier: Table. 7th ed. Taipei, Taiwan: Sun-Ming. (in Chinese)
Lancaster, F.W. (1998). Indexing and abstracting in theory and practice, 2nd ed. London: Library Association.
Larson, R. R. (1992). Experiments in automatic Library of Congress classification. Journal of the American Society for Information Science, 43(2), 130-48.
Lau, T., & Horvitz, E. (1999). Patterns of search: Analyzing and modeling Web query refinement. Proceedings of the 7th International Conference on User Modeling, Banff, Canada, 119-28.
Leonard, L. E. (1977). Inter-indexer consistency studies, 1954-1975: A review of the literature and summary of study results. Technical Report, University of Illinois, Graduate School of Library Science, Champaign, IL.
Lesk, M. (1969). Word-word associations in document retrieval systems. American Documentation, 20(1), 27-38.new window
Lighthouse (2001). Evaluation of Chinese search engines. Available: http://www.haiyan.com/steelk/navigator/b5index.htm (in Chinese)
Mobasher, B., Jain, N., Han, E., & Srivastava, J. (1996). Web mining: Pattern discovery from World Wide Web transactions. Technical Report TR96-050, Department of Computer Science, University of Minnesota.
Mitchell, J.S., & Vizine-Goetz, D. (2000). A research agenda for classification. Available: http://www.oclc.org/dewey/research/research_agenda.html
Nahl, D. (1998). Ethnography of novices’ first use of Web search engines: Affective control in cognitive processing. Internet Reference Services Quarterly, 3, 51-72.
Naylor, M., & Walsh, K. (1994). A time-series model for academic library data using intervention analysis. Library & Information Science Research, 16, 299-314.
Netscape (1999). Persistent client state HTTP cookies. Available: http://home.netscape.com/newsref/std/cookie_spec.html
Nua Internet Surveys (2001). Available: http://www.nua.ie/surveys/how_many_online/
Openfind (2001). News announcement on new features of Openfind services. Available: http://www.openfind.com.tw/ (in Chinese)
Paepcke, A., Garcia-Molina, H., Rodriguez-Mula, G., & Cho, J. (2000). Beyond document similarity: Understanding value-based search and browsing technologies. SIGMOD Records. 29(1), 80-92.new window
Peat, H. J. & P. Willett (1991). The limitations of term co-occurrence data for query expansion in document retrieval systems. Journal of the American Society for Information Science, 42(5), 378-83.
Peters, T.A. (1993). The history and development of transaction log analysis. Library Hi Tech, 11(2), 41-66.
Plaunt, C., & Norgard, B.A. (1997). An association based method for automatic indexing with a controlled vocabulary. Available: http://metadata.sims.berkeley.edu/assoc/assoc.html
Pollitt, A.S. (1998). The application of Dewey Classification in a view-based searching OPAC. Proceedings of the 5th ISKO Conference, 25-9.
Pu, H.-T., & Yang, C. (2002). Enriching user-oriented class associations for library classification schemes. The Electronic Library (submitted and revised).
Pu, H.-T., Chuang, S.-L., & Yang, C. (2000). Auto-categorization of search terms toward understanding Web users' information needs. ICADL 2000 - International Conference on Asian Digital Libraries. Soeul, Korea.
Pu, H.-T., Chuang, S.-L., & Yang, C. (2001). Exploration of Web users' search interests through automatic subject categorization of query terms. Proceedings of the 64th ASIST Annual Meeting.
Pu, H.-T., Chuang, S.-L., & Yang, C. (2002). Subject categorization of query terms for exploring Web users' search interests. Journal of the American Society for Information Science & Technology (in press).
Radcliff, D. (1999). A cry for privacy. Computerworld, 33(20), 46-7.
Roberts-Witt, S.L. (1999). Practical taxonomies: Hard-won wisdom for creating a workable knowledge classification system. Available: www.phys.uni.torun.pl/~duch/ref/s-search/taxonomy/featureb1.htm
Ross, N., & Wolfram, D. (2000). End user searching on the Internet: An analysis of term pair topics submitted to the Excite search engine. Journal of the American Society for Information Science, 51(10), 949-58.
Salton, G. (1986). On the use of term associations in automatic information retrieval. Proceedings of the 11th International Conference on Computational Linguistics, 380-6.
Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of information by computer. Reading, Mass: Addison-Wesley.
Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513-23.
Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.
Salton, G. (1991). Developments in automatic text retrieval. Science, 253, 974-80..
Salton, G., & Yang, C.S. (1973). On the specification of term values in automatic indexing. Journal of Documentation, 29, 351-72.
Sandore, B. (1993). Applying the results of transaction log analysis. Library Hi Tech, 11(2), 87-97.
Saracevic, T., & Kantor, P. B. (1997). Studying the value of library and information services. part II: Methodology and taxonomy. Journal of the American Society for Information Science, 48(6), 543-63.
Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2000). Analysis of recommendation algorithms for e-commerce. ACM Conference on Electronic Commerce.
Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. WWW10 Conference, Hong Kong.
Schwartz, C. (1981). Automatic classification of retrieved sets in online database searching. ASIS 44th Annual Meeting.
Search Engine Watch (2000). The search engine index. Available: http://searchenginewatch.com/reports/seindex.html
Searchenginewatch (2001a). Search Engine Size. Available: http://www.searchenginewatch.com/reports/sizes.html
Searchenginewatch (2001b). Searches Per Day. Available: http://searchenginewatch.com/reports/perday.html
Shafer, K., Subramanian, S., & Fausey, J. (1999). Measures for evaluating automatic subject assignment of electronic resources. Dublin, OH: OCLC. Available: http://orc.rsch.oclc.org:6109/measures.html
Silverstein, C., Henzinger, M., Marais, H,. & Moricz, M. (1999). Analysis of a very large Web search engine query log. SIGIR Forum, 33(1), 6-12.new window
Soergel, D. (1994). Indexing and retrieval performance: The logical evidence. Journal of the American Society for Information Science, 45(8), 589-99.
Solomon, P. (1992). User-based methods for classification development. Advances in Classification Research, 2, 163-70.
Sparck-Jones, K. (1971). Automatic keyword classification for information retrieval. Hamden, Conn.: Anchon Books.
Spink, A., Wolfram, D., Jansen, B.J., & Saracevic, T. (2001). Searching the Web: The public and their queries. Journal of the American Society for Information Science, 52(3), 260-73.
Strauss, A., & Corbin, J. (1990). Basics of qualitative research: Grounded theory procedures and techniques. Newbury Park, CA: Sage Publications.
Taylor, R.S. (1962). The process of asking questions. American Documentation, 13(4), 391-96.
Thompson, R., Shafer, K., & Vizine-Goetz, D. (1997). “Evaluating Dewey concepts as a knowledge base for automatic subject assignment. Dublin, OH.: OCLC. Available: http://orc.rsch.oclc.org:6109/eval_dc.html.
Tinker, A.J., Pollitt, A.S., O'Brien, A., & Braekevelt, P.A. (1999). The Dewey Decimal Classification and the transition from physical to electronic knowledge organization. Knowledge Organization, 26(2), 80-96.
van Rijsbergen, C. J. (1979). Information retrieval, 2nd ed. London: Butterworths.
Vizine-Goetz, D., & Godby, J. (1996). Library classification schemes and access to electronic collections: Enhancement of the Dewey Decimal Classification with supplemental vocabulary. ASIS 1996 Classification Workshop.
Wason, T.D. (2000). Dr. Tom’s taxonomy guide: Description, use and selections. IMS Global Learning Consortium. http://www.imsproject.org/drtomtaxonomiesguide.html
Weinberg, B. H. (1996). Complexity in indexing systems -- abandonment and failure: Implications for organizing the Internet. ASIS 1996 Annual Meeting.
Wells, H.G. (1936) “World Encyclopedia,” in The Growth of Knowledge: Readings on Organization and Retrieval of Information, ed. Manfred Kochen (New York: John Wiley & Sons, 1967).
Wilson, M.D., & Spillane, J.L. (2000). The relationship between subject headings for works of fiction and circulation in an academic library. Library Collections, Acquisitions, and Technical Services, 24(4), 459-65.
Wyle, M.F. (2001). Why search when you can find. Available: http://www.euphorion.com/whitepapers/Why-Search-Find.pdf
Xu, J. L. (1999). Internet search engines: Real world IR issues and challenges. Conference on Information and Knowledge Management, Kansas City, Missouri.
Yam (2000). Surveys on uses of the Internet in Taiwan. Available: http://survey.yam.com/ (in Chinese)
Yee, M. (1998). Guidelines for OPAC displays prepared for the IFLA task force on guidelines for OPAC displays. Available: http://www.ifla.org/VII/s13/guide/opac.htm.
Zborary, R.J. (1991). Reading patterns in Antebellum America: Evidence in the charge records of the New York Society Library. Libraries & Culture, 26, 301-33.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
QR Code
QRCODE