Class

Pattern

Extends:

Group more filters into one

Server as an umbrella for filters which are conceptually extracting the same thing - for example a price or a title or …

Sometimes the same piece of information can not be extracted with one filter across more result instances (for example a price has an XPath in record n, but since in record n+1 has a discount price as well, the real price is pushed to a different XPath etc) - in this case the more filters which extract the same thing are hold in the same pattern.

Constants
PATTERN_OPTIONS These options can be set upon wrapper creation
VALID_OPTIONS
VALID_OUTPUT_TYPES Model pattern are shown in the output
VALID_PATTERN_TYPES # a root pattern represents a (surprise!) root pattern PATTERN_TYPE_ROOT = :PATTERN_TYPE_ROOT # a tree pattern represents a HTML region PATTERN_TYPE_TREE = :PATTERN_TYPE_TREE # represents an attribute of the node extracted by the parent pattern PATTERN_TYPE_ATTRIBUTE = :PATTERN_TYPE_ATTRIBUTE # represents a pattern which filters its output with a regexp PATTERN_TYPE_REGEXP = :PATTERN_TYPE_REGEXP # represents a pattern which crawls to the detail page and extracts information from there PATTERN_TYPE_DETAIL_PAGE = :PATTERN_TYPE_DETAIL_PAGE # represents a download pattern PATTERN_TYPE_DOWNLOAD = :PATTERN_TYPE_DOWNLOAD # write out the HTML subtree beginning at the matched element PATTERN_TYPE_HTML_SUBTREE = :PATTERN_TYPE_HTML_SUBTREE
Public Attributes
children
constraints
extractor
filters
indices_to_extract
modifier_calls
name
next_page_url
options
parent
referenced_extractor
referenced_pattern
result_indexer
Public Methods
check_if_detail_page Check whether the currently created pattern is a detail pattern (i.e. it refrences a subextractor). Also check if the currently created pattern is an ancestor of a detail pattern , and store this in a hash if yes (to be able to traverse the pattern structure on detail pages as well).
check_if_shortcut_pattern Shortcut patterns, as their name says, are a shortcut for creating patterns from predefined rules; for example:
current=
evaluate
filter_count
generate_relative_XPaths
method_missing
method_missing Dispatcher function; The class was already too big so I have decided to factor out some methods based on their functionality (like output, adding constraints) to utility classes.
new
parent_of_leaf
parse_child_patterns
to_sexp
Private Methods
check_option
look_for_examples
parse_options_hash
Comments

Have your say
Please use Textile formatting (click here for a cheat sheet). Use <code/> and <pre/> for code samples.
Click here to login with OpenID to to post comments.