Hybrid-Search and Storage of Semi-structured Information
Eytan Adar

The World Wide Web is generally experienced as a single snapshot: the Now Web. The Now Web perspective means that only very recent states of the Web are being observed through browsers or through the crawling and index structures of search engines. Such a view ignores the utility of historical data to system designers and end-users alike. This dissertation characterizes content and behavior within the Dynamic Web. This work further illustrates the power of Dynamic Web data by utilizing measures of content change and behavioral measures of use to identify likely targets of user revisitation on the page level. Additionally, two systems are o ered for manipulating temporal Web data:

  • DTWExplorer, a tool for analyzing time-series representations of content and behavior to identify correlations in di erent types of temporal Web data. 
  • Zoetrope, an end-user tool for querying, aggregating, and visualizing historical Web data from the context of the Now Web.
This work capitalizes on the richness of the Dynamic Web and demonstrates how moving away from temporally insensitive models and tools can enhance existing applications (e.g., crawlers and search engines) and lead to novel applications and systems.

Adar, Eytan, "Temporal-Informatics of the WWW", Dissertation, University of Washington, 2009.
Available as: PDF (9.7Mb)