Module

Scrubyt

TODO: if multiline messages aren’t needed, then remove them.

TODO: switch to the conventional Ruby logger interface,
or create an adapter to it. If the former, then decided what to
do with the unit tests.

== From scrubyt/output/result_dumper.rb

NOT USED ANY MORE

== From scrubyt/output/result.rb

NOT USED ANY MORE

Modules
FetchAction Since lot of things are happening during (and before) the fetching of a document, I decided to move out fetching related functionality to a separate class - so if you are looking for anything which is loading a document (even by submitting a form or clicking a link) and related things like setting a proxy etc. you should find it here.
NavigationActions This class contains all the actions that are used to navigate on web pages; first of all, fetch for downloading the pages - then various actions like filling textfields, submitting formst, clicking links and more
Classes
AttributeFilter
BaseFilter A Scrubyt extractor is almost like a waterfall: water is pouring from the top until it reaches the bottom. The biggest difference is that instead of water, a HTML document travels through the space.
CompoundExample There are two types of string examples in scRUBYt! right now: the simple example and the compound example. The simple example is specified by a string, and a compound example is specified with :contains, :begins_with and :ends_with descriptors - which can be both regexps or strings
CompoundExampleLookup There are two types of string examples in scRUBYt! right now: the simple example and the compound example.
ConstantFilter
Constraint The two most trivial problems with a set of rules is that they match either less or more instances than we would like them to. Constraints are a way to remedy the second problem: they serve as a tool to filter out some result instances based on rules. A typical example:
ConstraintAdder Originally methods of Pattern - but since Pattern was already too heavy (and after all, adding a constraint (logically) does not belong to Pattern anyway) it was moved to this utility class. In pattern everything that begins with ensure_ is automatically dispatched here.
DetailPageFilter
DownloadFilter
Export
Extractor Extractor is a performer class - it gets an extractor definition and carries out the actions and evaluates the wrappers sequentially.
HtmlSubtreeFilter
Logger Simple logger implementation, based on Scrubyt’s original logging style. Messages will be sent to STDERR. Logging can be limited to certain message levels by specifying them on initialization, e.g.
Pattern Server as an umbrella for filters which are conceptually extracting the same thing - for example a price or a title or …
PostProcessor Some things can not be carried out during evaluation - for example the ensure_presence_of_pattern constraint (since the evaluation is top to bottom, at a given point we don’t know yet whether the currently evaluated pattern will have a child pattern or not) or removing unneeded results caused by evaluating multiple filters.
PreFilterDocument Before the document is passed to Hpricot for parsing, we may need to do different stuff with it which are clumsy/not appropriate/impossible to do once the document is loaded.
RegexpFilter
Result
ResultDumper
ResultIndexer If the results is list-like (as opposed to a ‘hard’ result, like a price or a title), probably with a variable count of results (like tags, authors etc.), you may need just specific elements - like the last one, every third one, or at specific indices. In this case you should use the select_indices syntax.
ResultNode
ScriptFilter
ScrubytResult
SharedUtils
SimpleExampleLookup There are two types of string examples in scRUBYt! right now: the simple example and the compound example.
TextFilter
TreeFilter
XPathUtils
Public Methods
log
logger= Logging is disabled by default. It can be enabled as follows:
Private Methods
logger
Comments

Have your say
Please use Textile formatting (click here for a cheat sheet). Use <code/> and <pre/> for code samples.
Click here to login with OpenID to to post comments.