PageVisualize.exe v1.0.0

PageVisualize Website Image Capture SDK

Copyright (c) 2004 Lucid Step Software

Introduction

The PageVisualize.exe v1.0.0 executable is a program that is used to capture images of websites. It can be used in GUI Mode (Graphical User Interface Mode) or Console Mode (which is used inside a DOS command prompt window). In GUI mode, a Windows user interface is presented with controls to configure the various available options. In Console Mode, options are submitted as command line switches. Both the GUI Mode and the Console Mode have the capability to process PageVisualize Command files (.pvc files), which are text files that contain a sequence of option statements and page image capture commands.

The program is multithreaded, and is therefore capable of performing multiple webpage image capture operations simultaneously. It is very flexible and provides many options to customize the website image capturing process. It can be used manually with a human operated present, or can be used in an automated manner on a server with no human user present. The PageVisualize Command file processing functionality is ideal for automating the page capturing process. The page capturing engine is very fast and is capable of processing very large page capture batches.

Getting Started

To get started using the PageVisualize program, either go to the Windows Start Menu, Programs menu, PageVisualize submenu, and click the shortcut to start the program in GUI mode, or open a command prompt, change the current directory to the PageVisualize application folder ("C:\Program Files\PageVisualize\Application\" by default), and type "PageVisualize" at the command prompt to display a usage summary.

GUI Mode

When you run the installer, by default a PageVisualize menu is added to your Start Menu. The menu contains a shorcut to launch the PageVisualize software in GUI mode by passing in the "/gui" command line switch (InstallPath\PageVisualize.exe /gui"). After the program has started, you can configure the options as desired using the controls provide. See the Options Reference section of this document for an explanation of each option.

Once the options are configured as desired, enter the URL of the page you would like to capture into the URL edit box. Then click on the "Capture Page Image" button to capture an image of the webpage. After the image has been captured, an image file is saved to disk. Also, the most recently captured image is displayed on the Captured Images tab of the PageVisualize main window. If the "LogFileName" option is blank, log messages are displayed on the "Log Messages" tab of the PageVisualize main window. Otherwise they are saved to the specified log file.

If the "Asynchronous" option is checked when you click the "Capture Page Image" button, you don't need to wait for the capture operation to finish before entering a new URL and clicking the button again. In fact you can enter new URLs and click the "Capture Page Image" button as many times as desired without waiting for the capture operations to complete, because the page capture processing will continue running asynchronously on backround threads. If the "Asynchronous" option is not checked when you click the "Capture Page Image" button, you will need to wait for the capture operation to complete before you can do anything else.

When you configure the various options that are available to control the capturing process, you may notice that the text in the "Sample Command Line Options" text box and the "Sample Command File Page Capture Options" text box changes to reflect the options that you have configured. These sample option strings can be copied out of the text boxes and used on the command line (if you copied the "Sample Command Line Options" text) or in a PageVisualize Command File (if you copied the "Sample Command File Page Capture Options" text). The auto-generated Sample Options text string feature is intended to make it easy to create a set of command line options that you can paste into a command prompt window, or a command file capture command string that you can paste into a command file, without needing to remember or look up all of the available options. All you need to do is use the GUI to select the desired options, and the appropriate option strings are generated for you automatically.

To process a PageVisualize Command file (.pvc file), enter the file name into the CommandFileName edit box, or click on the "..." button to the right of the CommandFileName edit box to display a file dialog that you can use to select the file name. Then click on the "Go" button. You can observe the progress of the command file capturing processing by watching the "Status" panel, switching to the "Captured Images" panel, or watching the log output. The log output will either be displayed on the "Log Messages" tab or saved to a log file, depending on whether the "LogFileName" option has been set to a non-blank value. The "LogFileName" option can be set using the edit box provided in the GUI interface, or by using an option statement in the command file.

The "Status" panel gives you feedback about the progress in processing the page capture requests that you have submitted when you pressed either the "Capture Page Image" button (to capture a single page image) or the "Go" button (to execute a Page Visualize Command file, which is generally used to capture multiple page images). The number of capture operations queued for processing (waiting to be processed) is displayed next to the "Queued:" label. The number of capture operations currently processing (in progress) is displayed next to the "Capturing:" label. The number of capture operations that have already successfully completed is displayed next to the "Completed:" label. The number of capture operations that have timed out before completing is displayed next to the "Timed Out:" label. The number of capture operations that have failed due to errors is displayed next to the "Failed:" label. The number of capture commands that have been parsed without actually being processed is displayed next to the "Parsed w/o Processing:" label.

Console Mode

To use the PageVisualize software in GUI mode, first open a command prompt window. Then change the current working directory to "C:\Program Files\PageVisualize\Application\", or to the directory in which the PageVisualize.exe program is located if you installed the software somewhere other than the default location. Alternatively you can add the directory where the PageVisualize.exe program is located to the system file search path, and then it won't be necessary to change the current working directory before running the program. A third option is to enter the fully qualifed PageVisualize.exe file name (including the path information) whenever you need to run the program.

Enter "PageVisualize" at the command prompt and press enter to run the program. If you don't enter any command line switches, or if you enter "/?" or "/h" as a command line switch, a basic usage summary will be displayed.

The most basic way to use the PageVisualize program in command line mode is to enter the executable name followed by one or more URLs, like the following:

PageVisualize http://www.example.com

This will initiate a page capture operation for the specified URL using the default options. To use options other than the default, enter the command line switches prior to the URL(s) to which they should be applied, like the following:

PageVisualize /ImageResizePercent=50 /ImageFileName=ExamplePageCapture http://www.example.com

This will capture the page, resize it to 50 percent of the length and width of the default capture area (from 800x600 to 400x300), and save the image in the default image format (PNG) to the filename "ExamplePageCapture.png" in the current working directory. Note that if the URL was placed in a position preceding the options, the options would not apply for the page capture of the URL.

Most command line switch names and values are case insensitive. The primary exception to this rule is the "CaptureDownloadOptions" switch, which uses case sensitive option values. Also, regarding the "CaptureDownloadURL" option, some URLs may be case sensitive, but this depends on the server software used to serve up the webpage (Unix-based and Linux-based webservers generally treat the directory and filename portion of the URL in a case-sensitive manner, while Windows-based webservers generally treat the entire URL in a case-insensitive manner).

Abbreviated command line switch option names are available to shorten the command line text that must be entered. In most cases the abbreviated option name is simply the first letter of each word of the full-length option name. A shortened example that is equivalent to the previous example follows:

PageVisualize /irp=50 /ifn=ExamplePageCapture http://www.example.com

In addition to the "/" (forward slash) character, the "-" (hyphen) character may be used as the first character of a command line switch. The following example is equivalent to the previous example:

PageVisualize -irp=50 -ifn=ExamplePageCapture http://www.example.com

Notice in the examples above that each command line switch consists of a single switch prefix character ("/" or "-"), the option full-length name or abbreviated name, the "=" (equals sign) character, and the option value. If the option value has spaces, it should be encapsulated in double quotes, as in the following example:

PageVisualize /irp=50 /ifn="Name With Spaces" http://www.example.com

In no case should there be spaces either immediately before or immediately after the "=" (equals sign) character. If spaces are included between the option name and the "=" character, or between the "=" character and the option value, the command line switch will not be parsed correctly and an error condition will result.

It is acceptable to include multiple URLs in a single command line, as in the following:

PageVisualize /irp=50 /ift=JPEG /ifn=JPEGImage www.example.com /ift=GIF /ifn=GIFImage www.example.com

In order for an option to be in effect for a specific URL, it must precede the URL in the command line string. In the case where an option is repeated, the rightmost occurrence of the option that precedes the specific URL is the one that will be in effect when a page capture is performed for the URL. In the example above, the "/irp" (ImageResizePercent) option precedes both URLs and thus is in effect for both URLs. The first occurrences of the "/ift" (ImageFileType) and "/ifn" (ImageFileName) options precede the first URL and are in effect when it is captured, but they are overriden by the second occurrences of the "/ift" and "/ifn" options, which specify new values for these options, so the new values are in effect for the second URL. Thus two images will be captured and saved, one as a JPEG image named "JPEGImage.jpg", and one as a GIF image named "GIFImage.gif". Note that since the protocol prefix is ommitted from the URL, the "http://" prefix is assumed.

To process a command file from the command line, use the "CommandFileName" option switch, as in the following example:

PageVisualize /irp=50 /ift=JPEG /ifn=JPEGImage www.example.com /ift=GIF /ifn=GIFImage www.example.com

It is acceptable to mix various options, URLs, and command file names all in the same command line statement. If this is done, the options and commands will be processed in the sequence in which they are encountered on the command line and within the command file(s).

PageVisualize Command Files

A PageVisualize Command File (.pvc file) is a text file that contains global options and page capture commands. Any standard text editor, such as Windows Notepad, can be used to create the command file.

In command files, global options and capture commands use the same general set of option names as command line switches. However, some option names are permitted in one place but not another. For information about where each option name is permitted to be used, see the Options Reference section of this document.

In the command file, each global option or page capture command is listed on its own line. The first non-whitespace character in each line determines whether the line will be treated as a comment, a global option, or a capture command. Beginning and trailing whitespace characters are stripped off before processing each line, and are ignored.

If the first character of the line is either ";" or "#", the line is treated as a comment and will be ignored. The comment string terminates at the end of the line. To create a multi-line comment, prefix each line of the comment with a comment character. An example follows:

# This is a comment that

; spans two lines of text

If the first character of the line is either "/" or "-", the line is treated as a global option. Global options are similar to command line switches in that they take effect for all URLs and capture commands which the precede. Also, the syntax of global options is essentially the same as the syntax for command line switches. The primary difference is that only one global option is permitted per line of text in a command file, whereas multiple command line switches are permitted (and expected) on the same command line. For a description of the syntax of command file global option lines, please see the description of the syntax of command line options in the Console Mode section of this document. An example follows:

/LogFileName=PageVisualizeLogMessages.log

If the first character of the line is the "|" (pipe) character, the line is treated as a capture command. Each capture command consists of a pipe-delimitted list of OptionName=value strings. Each option included in the capture command is effective only for the single page capture represented by the capture command in which it is embedded. For each option not included in the capture command, the value from the most recent preceding instance of the equivalent global option is effective, or the default value for the option is effective if there is no preceding instance of the equivalent global option. Either full-length or abbreviated option names may be used. In a capture command, the option name must not be preceded by a prefix character, unlike global options and command line switches. Spaces may optionally be present before the option name or after the option value, but spaces must not be present between the option name and the "=" character or between the "=" character and the option value. An example of a capture command follows:

| irp=50 | ifn="Name With Spaces" | http://www.example.com |

If the first character of the line is none of the above-mentioned characters, the line is assumed to be a single URL, and is treated as a capture command that uses the global option values that are currently in effect. For any global option that is not specified somewhere in the file prior to the URL capture line, the default value for that option is in effect. An example of a single-URL capture line follows:

http://www.example.com

For detailed sample PageVisualize Command Files, please look in the "PVCFiles" subfolder of the folder where the PageVisualize software was installed.

Options Reference

The following is a list of all available options. The options can be used as command line switches, as global options in a PageVisualize Command File (.pvc file), or within capture commands in a PageVisualize Command File. All options are permitted to be used as command line switches. All options except the following (and their abbreviated equivalents) are permitted to be used as global options in a command file: Help, GraphicalUserInterface, CommandFileName, CapturePageURL, ImageFileName. All options except the following (and their abbreviated equivalents) are permitted to be used within capture commands in a command file: Help, MaxCaptureThreads, BatchTimeoutSeconds, ParseWithoutProcessing, GraphicalUserInterface, CommandFileName, LogFileName, LogFileOption.

/Help[=value], /hlp[=value], /h[=value], /?[=value]

   BRIEF DESCRIPTION

   Displays usage summary or usage information for a specific option.

   ACCEPTED VALUES

   The value may by blank, "all", "summary", or an option name.

   DETAILED DESCRIPTION

   Use the "Help" option to get information on how to use the program.

   If the value is left blank, a basic usage summary is displayed. If

   the value "all" is specified, more detailed help information is

   displayed for all available options, in addition to a basic usage

   summary. If the value "summary" is specificed, a basic usage summary

   is displayed. If a specific option name is specified, more detailed

   help information is displayed for the specific option, in addition to

   a basic usage summary.

   EXAMPLES

   /Help=MaxCaptureThreads

   /hlp=gui

   /h=all

   /?

/CommandFileName=value, /cfn=value

   BRIEF DESCRIPTION

   Specifies the name of a PageVisualize command file (.pvc file) to be

   processed.

   ACCEPTED VALUES

   The value must be the name of a valid PageVisualize command file.

   DETAILED DESCRIPTION

   The "CommandFileName" option is used to specify the name of a

   PageVisualize command file (.pvc file), which is a text file that

   contains special options and commands to be processed by the

   PageVisualize website image capturing engine. The name may be a fully

   qualified file name (include path information), or a relative file

   name. The options and capture commands in the file must conform with

   the PageVisualize command file syntax. In its most basic form, the

   command file may contain simply a list of URLs, one URL per line. For

   a detailed description of the PageVisualize command file syntax,

   please see the PageVisualize documentation.

   EXAMPLES

   /CommandFileName=Sample-Basic.pvc

   /cfn="C:\Program Files\PageVisualize\PVCFiles\Sample-Complete.PVC"

/LogFileName=value, /lfn=value

   BRIEF DESCRIPTION

   Specifies the name of a file to which log messages will be saved.

   ACCEPTED VALUES

   The value must either be blank, or a valid file name.

   DETAILED DESCRIPTION

   The "LogFileName" option is used to specify the name of a file to

   which log messages will be saved. Log messages may be generated by

   various events, including the successful capture of a page image, or

   the failure to capture a page image. If the log file does not already

   exist when a message is logged, it will be created. If it does

   already exist, the message will be appended to the end of the file.

   The log file is written in a plain text format. If no log file name

   is specified, or if a blank file name is specified, log messages will

   be output to the console.

   EXAMPLES

   /LogFileName=PageVisualize.log

   /lfn=output.txt

/LogFileOption=value, /lfo=value

   BRIEF DESCRIPTION

   Controls whether messages are logged on success, on failure, or both.

   ACCEPTED VALUES

   "OnSuccess", "OnFailure", "OnSuccessOrFailure", "s", "f", "sf".

   The default value is "OnSuccessOrFailure".

   DETAILED DESCRIPTION

   Use "LogFileOption" to control what types of messages are logged. If

   "OnSuccess" or "s" is specified, only success messages will be logged.

   If "OnFailure" or "f" is specified, only failure messages will be

   logged. If "OnSuccessOrFailure" or "sf" is specified, both success

   messages and failure messages will be logged.

   EXAMPLES

   /LogFileOption=OnFailure

   /lfo=OnSuccessOrFailure

   /lfo=sf

   /lfo=f

/MaxCaptureThreads=value, /mct=value

   BRIEF DESCRIPTION

   Maximum number of simultaneous asynchronous image capture threads.

   ACCEPTED VALUES

   The value must be an integer in the range 1 - 9999.

   The default value is 10.

   DETAILED DESCRIPTION

   The "MaxCaptureThreads" option is used to configure the maximum number

   of simultaneous threads that will be used for asynchronous website

   image capture operations. If the value is too low, the page capturing

   engine will capture fewer website images in parallel and will thus be

   unable to utilize fully the available hardware resources. If the

   value is too high, the high processing demands may overwhelm the

   hardware and the machine may become sluggish or unresponsive.

   EXAMPLES

   /MaxCaptureThreads=30

   /mct=5

/BatchTimeoutSeconds=value, /bts=value

   BRIEF DESCRIPTION

   Maximum number of seconds before the batch capture will time out.

   ACCEPTED VALUES

   The value must be an integer in the range 1 - 9999999.

   The default value is 600 (10 minutes).

   DETAILED DESCRIPTION

   The "BatchTimeoutSeconds" option is used to control the amount of time

   that the page capturing engine will wait before timing out a batch of

   image capture operations. If the number of page captures in the batch

   is large, the value for this option should be increased to allow

   sufficient time for the batch to complete. If the batch times out

   prior to completion, the page capture operations that completed prior

   to the timeout will have been already saved to disk and will not be

   deleted.

   EXAMPLES

   /BatchTimeoutSeconds=6000

   /bts=100

/ParseWithoutProcessing[=value], /pwp[=value]

   BRIEF DESCRIPTION

   Enables or disables parsing (syntax checking) without image capturing.

   ACCEPTED VALUES

   "1", "yes", "true", "enable"; "0", "no", "false", "disable".

   If this option is not specified, the default value is "0" or

   "disable". If this option is specified without a value, the default

   value is "1" or "enable".

   DETAILED DESCRIPTION

   If the "ParseWithoutProcessing" option is enabled (using a value of

   "1", "yes", "true", or "enable"), then all input will be parsed and

   checked for syntax errors, but no page captures will processed. If

   this option is disabled (the default), then both parsing and normal

   processing will occur. This option can be enabled to check a large

   batch for syntax problems prior to the actual processing of the batch.

   This option is most useful in conjunction with the "CommandFileName"

   option.

   EXAMPLES

   /ParseWithoutProcessing=enable

   /pwp=1

   /pwp

/GraphicalUserInterface[=value], /gui[=value]

   BRIEF DESCRIPTION

   Controls whether the program runs in GUI mode or console mode.

   ACCEPTED VALUES

   "1", "yes", "true", "enable"; "0", "no", "false", "disable".

   If this option is not specified, the default value is "0" or

   "disable". If this option is specified without a value, the default

   value is "1" or "enable".

   DETAILED DESCRIPTION

   The "GraphicalUserInterface" option is used to control whether the

   program will run in GUI mode (Graphical User Interface mode) or in

   console mode. If GUI mode is enabled, a Windows graphical user

   interface will be displayed. If GUI mode is disabled, the program

   will run in console mode.

   EXAMPLES

   /GraphicalUserInterface=true

   /gui=yes

   /gui

/CapturePageURL=value], /url=value

   BRIEF DESCRIPTION

   The URL of the website for which an image is to be captured.

   ACCEPTED VALUES

   The value must be a valid URL that points to an HTML, XML, or text

   document (the file extension should be .html, .htm, .xml, or .txt).

   The protocol prefix should be "http://", "https://", or "file://"

   ("ftp://" and other protocols are disallowed).

   DETAILED DESCRIPTION

   The "CapturePageURL" option is used to specify the URL of the website

   or local HTML document that should be used to render an image, which

   is then captured and saved to a file. Since the "CapturePageURL"

   option is the default when no option name is specified, it is

   acceptable to specify one or more URLs while including neither the

   long option name ("CapturePageURL") nor the short option name ("url").

   If the URLs are specified on the command line, they should be

   separated by spaces. If the URLs are specified in a command file,

   each URL should be on a line by itself in the command text file. Note

   that options on the command line take effect in the order in which

   they appear on the command line, so each "CapturePageURL" value should

   always appear in sequence on the command line in a position that is

   after any options that should be applied to the page capture for the

   specific URL.

   EXAMPLES

   /CapturePageURL=http://www.example.com

   /url=http://www.example.com

   http://www.example.com

/CapturePageWidth=value, /cpw=value

   BRIEF DESCRIPTION

   The width of the website page capture area in pixels.

   ACCEPTED VALUES

   The value must be an integer in the range 1 - 9999.

   The default value is 800.

   DETAILED DESCRIPTION

   The "CapturePageWidth" option is used to set the width in pixels of

   the rectangular area in which the page will be rendered and from

   which a page image will be captured. Many websites are designed to

   fit a width of 800, so this is the default value. The value can be

   adjusted up or down as needed to fit the desired area.

   EXAMPLES

   /CapturePageWidth=800

   /cpw=1024

/CapturePageHeight=value, /cph=value

   BRIEF DESCRIPTION

   The height of the website page capture area in pixels.

   ACCEPTED VALUES

   The value must be an integer in the range 1 - 9999.

   The default value is 600.

   DETAILED DESCRIPTION

   The "CapturePageHeight" option is used to set the height in pixels of

   the rectangular area in which the page will be rendered and from

   which a page image will be captured. An image with a height of 600

   will fit on most displays, and this is the default value. However, a

   height of 600 will usually not be sufficient to capture the entire

   length of the webpage, so if this is the objective, the value should

   be increased as needed to capture the full height of the page, or as

   much as needed. However, if the objective is only to provide a

   preview image of the website, the default of 600 is generally

   sufficient.

   EXAMPLES

   /CapturePageHeight=600

   /cph=768

/CaptureTimeoutSeconds=value, /cts=value

   BRIEF DESCRIPTION

   Maximum number of seconds before the page capture will time out.

   ACCEPTED VALUES

   The value must be an integer in the range 1 - 9999.

   The default value is 60 (one minute).

   DETAILED DESCRIPTION

   The "CaptureTimeoutSeconds" option is used to control the amount of

   time that the page capturing engine will wait before timing out a

   specific page image capture operation. If the specific page to be

   captured is slow to load, the value for this option should be

   increased to allow sufficient time for the page to load before an

   image is captured. If the page capture times out prior to completion,

   an image may or may not be captured and saved depending on the

   page display progress percentage at timeout and the value of the

   "CaptureMinProgressToKeep" option.

   EXAMPLES

   /CaptureTimeoutSeconds=120

   /cts=40

/CaptureMinProgressToKeep=value, /cmp=value

   BRIEF DESCRIPTION

   Minimum page display progress percent to keep if the page times out.

   ACCEPTED VALUES

   The value must be an integer in the range 1 - 100.

   The default value is 100.

   DETAILED DESCRIPTION

   The "CaptureMinProgressToKeep" option may be used to configure the

   minimum page load and display progress percentage that will be kept

   and captured in the event that the page loading and rendering process

   times out before reaching 100 percent completion. This value has no

   effect for pages that complete the loading and rendering process

   before the timeout (as set by the "CaptureTimeoutSeconds" option)

   expires. For pages that time out prior to 100 percent completion,

   if the completion percentage is not equal to or greater than the value

   of "CaptureMinProgressToKeep" then the operation fails and the

   partially-downloaded page is discarded. Otherwise, the page image is

   captured and saved, and if the completion percentage was less than 100

   but equal to or greater than the value of "CaptureMinProgressToKeep",

   a message is logged in addition to saving the captured image. This

   option should generally be left at the default value of 100, since if

   it is set to a lower value then an image may be captured of a

   partially-downloaded page, which may in some circumstances even appear

   to be blank.

   EXAMPLES

   /CaptureMinProgressToKeep=90

   /cmp=75

/CaptureDownloadOptions=value, /cdo=value

   BRIEF DESCRIPTION

   Options used to control what the webpage download process.

   ACCEPTED VALUES

   The value must be a string formed from the concatenation of zero

   or more of the following, without any spaces or other delimiters:

   J, j, X, x, R, r, I, i, S, s, M, m, B, b, C, c

   The default value is "IvFxjSUmbcOrg".

   DETAILED DESCRIPTION

   Use "CaptureDownloadOptions" to control various aspects of the webpage

   download process. Each letter in the option string either enables or

   disables a specific download option. An uppercase letter enables the

   corresponding option, and a lowercase letter disables the

   corresponding option. The option letters and their meanings are as

   follows:

   I or i - AllowImages or DisallowImages

    V or v - AllowVideos or DisallowVideos

    F or f - AllowFrames or DisallowFrames

    X or x - AllowActiveX or DisallowActiveX

    J or j - AllowJava or DisallowJava

    S or s - AllowScripts or DisallowScripts

    U or u - AllowUTF8 or DisallowUTF8

    M or m - AllowMetaCharSet or DisallowMetaCharSet

    B or b - AllowBehaviors or DisallowBehaviors

    C or c - AllowClientPull or DisallowClientPull

    O or o - AllowOffline or DisallowOffline

    R or r - AllowForceOffline or DisallowForceOffline

    G or g - AllowIgnoreCache or DisallowIgnoreCache

   EXAMPLES

   /CaptureDownloadOptions=IvFxjSUmbcOrg

   /cdo=xjS

/ImageFilePath=value, /ifp=value

   BRIEF DESCRIPTION

   Specifies the directory path to which captured images will be saved.

   ACCEPTED VALUES

   The value must either be blank, or a valid file directory path.

   DETAILED DESCRIPTION

   The "ImageFilePath" option is used to specify the path of the

   directory to which captured image files will be saved. It may be an

   absolute path or a path relative to the current working directory.

   If the directory does not already exist when an image is saved, the

   directory will be automatically created.

   EXAMPLES

   /ImageFilePath="C:\Captured Website Images\"

   /ifp=C:\Images

/ImageFileName=value, /ipn=value

   BRIEF DESCRIPTION

   Specifies the name of a file to a captured image will be saved.

   ACCEPTED VALUES

   The value must either be blank, or a valid file name.

   DETAILED DESCRIPTION

   The "ImageFileName" option is used to specify the name of a file to

   which a captured image will be saved. If a fully qualified file name

   (including an absolute path) is specified for the "ImageFileName"

   value, the value of "ImageFilePath" will be ignored and the path

   specified in the "ImageFileName" value will used. Otherwise, the

   path information in the "ImageFilePath" value will be combined with

   the file name (and optional relative path information) in the

   "ImageFileName" value to create the fully qualified file name.

   If the value for "ImageFileName" is blank, a file name will be

   automatically generated from the page title, if available, or from

   the URL if the page title is not available. If a file extension is

   not specified, the correct file extension for the image file format

   will be automatically appended. If a file already exists with the

   same name as as the file to be saved, either it will be overwritten,

   or the file will be saved under a different name with numbers appended

   to make it unique, depending on the value of the "ImageFileOverwrite"

   option.

   EXAMPLES

   /ImageFileName=CapturedImage.png

   /ifn=C:\Images\WebsiteImage.jpg

/ImageFileType=value, /ift=value

   BRIEF DESCRIPTION

   Determines the file format in which the captured image will be saved.

   ACCEPTED VALUES

   "PNG", "JPEG", "BMP", "GIF".

   The default value is "PNG".

   DETAILED DESCRIPTION

   The "ImageFileType" option is used to select the file format in which

   the captured image will be saved. The "BMP" value is used to select

   an uncompressed bitmap format. This will result in the largest image

   file, but will not lose image detail if the captured image is saved at

   full size. The "PNG" file format uses non-lossy compression that also

   will not lose image detail, but results in a much smaller image file.

   The "JPEG" file format uses lossy compression that will often create

   the smallest image files for image captures with large amounts of

   detail, but some of the detail will be lost, even if the captured

   image is saved at full size. The "GIF" file format uses non-lossy

   compression and can generate reasonably small images, but it uses a

   reduced color pallette so the colors of the captured image may not

   be the same as the original colors.

   EXAMPLES

   /ImageFileType=PNG

   /ift=JPEG

   /ift=BMP

/ImageResizePercent=value, /irp=value

   BRIEF DESCRIPTION

   Determines the resize percent of the image to be saved.

   ACCEPTED VALUES

   The value must be an integer in the range 1 - 200.

   The default value is 100.

   DETAILED DESCRIPTION

   The "ImageResizePercent" option is used to determine the size of the

   image to be saved relative to the page image capture area as set by

   the "CapturePageWidth" and "CapturePageHeight" options. For example,

   if the "CapturePageWidth" and "CapturePageHeight" options are set to

   800 and 600, respectively, and the "ImageResizePercent" option is set

   to 50, then the length and width of the original page capture will be

   800 and 600, but the length and width of the resized image, which is

   the image that is saved to disk, will be 400 and 300. Note that the

   "ImageResizePercent" option determines the relative length and width

   of the resized image, not the relative image area (if the length and

   with of the resized image are 50 percent of the original image, the

   resized image area will actually be 25 percent of the original image

   area). This option can be used either to shrink or to enlarge the

   image to be saved relative to the captured page area. Note that some

   blurring or loss of detail may occur when the image is resized.

   EXAMPLES

   /ImageResizePercent=80

   /irp=33

/ImageCompressionQuality=value, /icq=value

   BRIEF DESCRIPTION

   Determines the compression quality ratio for JPEG images.

   ACCEPTED VALUES

   The value must be an integer in the range 1 - 100.

   The default value is 80.

   DETAILED DESCRIPTION

   The "ImageCompressionQuality" option is used to determine the quality

   of JPEG images, which uses a form of lossy compression that sacrifices

   some image detail or quality in exchange for smaller image file sizes.

   A value of 100 gives the highest quality JPEG image, but also creates

   the largest JPEG file size. A value of 1 gives the lowest quality

   JPEG image, but creates the smallest JPEG file size. A value in the

   range of 70 to 90 will generally provide a reasonable balance between

   good quality and small image size. Note that this option is ignored

   when image types other than JPEG are used.

   EXAMPLES

   /ImageCompressionQuality=80

   /icq=30

/ImageGrayscaleConvert=value, /igc=value

   BRIEF DESCRIPTION

   Determines whether captured images are converted to grayscale.

   ACCEPTED VALUES

   "1", "yes", "true", "enable"; "0", "no", "false", "disable".

   If this option is not specified, the default value is "0" or

   "disable". If this option is specified without a value, the default

   value is "1" or "enable".

   DETAILED DESCRIPTION

   The "ImageGrayscaleConvert" option is used to determine whether an

   image will be converted to grayscale prior to being saved to disk. If

   this option is enabled, the final image saved to disk will be saved as

   a grayscale (shades of gray ranging from black to white) image, which

   means that there will be no color in the saved image, even if there

   was color in the original captured page. If this option is disabled,

   the final image saved to disk will have color as long as there was

   color in the original captured image. An image file saved in

   grayscale mode will generally be significantly smaller than an

   equivalent image file saved in color mode.

   EXAMPLES

   /ImageGrayscaleConvert=no

   /igc=true

   /igc

/ImageFileOverwrite[=value], /ifo[=value]

   BRIEF DESCRIPTION

   Determines whether existing image files will be overwritten.

   ACCEPTED VALUES

   "1", "yes", "true", "enable"; "0", "no", "false", "disable".

   The default value is "1" or "enable".

   DETAILED DESCRIPTION

   The "ImageFileOverwrite" option is used to determine whether an

   existing file will be overwritten in the case where a new image file

   is to be saved using the same file name. If this option is enabled

   (and it is by default) then when a file of the same name already

   exists, it will be overwritten by the new file as long as the existing

   file is not already open with an exclusive lock that prevents writing

   to the file. If this option is disabled, when a file already exists

   with the same name as new file to be saved, then the new file is saved

   under a different, unique file name. The unique file name is

   generated by appending one or more digits to the old file name as

   needed until there is no existing file of the same name.

   EXAMPLES

   /ImageFileOverwrite=disable

   /ifo=1

   /ifo

Support and Contact Information

For further information or to obtain support, please use the following contact information:

Website: http://www.pagevisualize.com

Email: support@pagevisualize.com

PageCaptureLibrary.dll v1.0.0

PageVisualize Website Image Capture SDK

Copyright (c) 2004 Lucid Step Software

Introduction

The PageCaptureLibrary.dll v1.0.0 is a COM automation object that is used to capture images of websites. It has a simple yet flexible API. It can be used from any Windows programming language that can make use of COM automation objects, including Visual Basic, C++, ASP, Delphi, C#, or any .NET language. It has the capability to process PageVisualize Command files (.pvc files), which are text files that contain a sequence of option statements and page image capture commands.

The COM object is multithreaded, and is therefore capable of performing multiple webpage image capture operations simultaneously. It is very flexible and provides many options to customize the website image capturing process. The PageCaptureLibrary.dll COM object is ideal for automating the page capturing process. The page capturing engine is very fast and is capable of processing very large page capture batches.

Getting Started

Before you can begin using the PageCaptureLibrary.dll COM object, it should first be registered with the operating system so that Windows will be able to locate it when needed. If installed by the installer, the COM object is registered automatically. Otherwise, it is necessary to register it manually. To register it manually, open a command prompt and change the current working directory to the directory where the dll file is located ("C:\Program Files\PageVisualize\Component\" by default). Then run the following command:

regsvr32 PageCaptureLibrary.dll

If you use the uninstaller to remove the PageCaptureLibrary.dll COM object, it will be unregistered automatically. If you need to unregister the COM object manually, open a command prompt and change the current working directory to the directory where the dll file is located ("C:\Program Files\PageVisualize\Component\" by default). Then run the following command:

regsvr32 /u PageCaptureLibrary.dll

When you are ready to start using the COM object, you should first test a simple example to make sure it is registered correctly and that everything works. Open Notepad (or your preferred text editor) and copy and paste in the following sample VBScript code. Change the URL from "http://www.example.com" to some other valid URL, if you would like to do so. Then save the file and name it something like "TestPageCapture.vbs".

' Create an instance of the page capture engine

Set pageCaptureEngine = CreateObject("PageCaptureLibrary.PageCapture")

' Capture a web page image using default options and save it to a file

pageCaptureEngine.CapturePageSynchronously "http://www.example.com", "ExampleFileName"

' Release the page capture engine instance

Set pageCaptureEngine = Nothing

' All finished, so quit

Wscript.Quit

In Windows Explorer, navigate to the folder where you saved the "TestPageCapture.vbs" file and double click on the file. It should quietly execute, and soon an image file named "ExampleFileName.png" should appear in the folder. Double-click on the image file to open it in your default image viewer. If it appears to be a captured image of the website URL you saved in the test script file, then everything is working correctly. Otherwise, you should double-check to make sure you followed all of the instructions above.

API Methods Overview

The PageCaptureLibrary.dll COM object provides a COM interface named IPageCapture that exposes the following methods:

HRESULT _stdcall CapturePageSynchronously(

 [in] BSTR URL,

 [in] BSTR ImageFileName

);

HRESULT _stdcall CapturePageSynchronouslyWithOptions(

 [in] BSTR URL,

 [in] int CapturePageWidth,

 [in] int CapturePageHeight,

 [in] int CaptureTimeoutSeconds,

 [in] int CaptureMinProgressToKeep,

 [in] BSTR CaptureDownloadOptions,

 [in] BSTR ImageFilePath,

 [in] BSTR ImageFileName,

 [in] BSTR ImageFileType,

 [in] int ImageResizePercent,

 [in] int ImageCompressionQuality,

 [in] int ImageGrayscaleConvert,

 [in] int ImageFileOverwrite

);

HRESULT _stdcall CapturePageAsynchronously(

 [in] BSTR URL,

 [in] BSTR ImageFileName

);

HRESULT _stdcall CapturePageAsynchronouslyWithOptions(

 [in] BSTR URL,

 [in] int CapturePageWidth,

 [in] int CapturePageHeight,

 [in] int CaptureTimeoutSeconds,

 [in] int CaptureMinProgressToKeep,

 [in] BSTR CaptureDownloadOptions,

 [in] BSTR ImageFilePath,

 [in] BSTR ImageFileName,

 [in] BSTR ImageFileType,

 [in] int ImageResizePercent,

 [in] int ImageCompressionQuality,

 [in] int ImageGrayscaleConvert,

 [in] int ImageFileOverwrite

);

HRESULT _stdcall ProcessCommandFileAsynchronously(

 [in] BSTR CommandFileName

);

HRESULT _stdcall ProcessCommandTextAsynchronously(

 [in] BSTR CommandFileText

);

HRESULT _stdcall WaitForAsynchCompletion(

 void

);

The "CapturePageSynchronously", "CapturePageSynchronouslyWithOptions", "CapturePageAsynchronously", and "CapturePageAsynchronouslyWithOptions" methods are all used to capture page images.

The two synchronous methods, "CapturePageSynchronously" and "CapturePageSynchronouslyWithOptions", will not return to the caller until the page capture operation has completed or timed out. The two asynchronous methods, "CapturePageAsynchronously" and "CapturePageAsynchronouslyWithOptions", will queue up the capture operation for asynchronous processing and then return to the caller immediately.

The methods named "CapturePageSynchronously" and "CapturePageAsynchronously" only accept a URL and an ImageFileName as parameters, and rely on the properties of the PageCapture object instance for the rest of the option values. The methods named "CapturePageSynchronouslyWithOptions" and "CapturePageAsynchronouslyWithOptions" accept all available options as parameters. However, it is still possible to fall back on the values stored in the properties of the PageCapture object instance by passing in a negative value (if the parameter is an integer) or an empty string (if the parameter is a string).

The "ProcessCommandFileAsynchronously" method is used to process the commands in a PageVisualize command file. The method accepts a single string parameter named CommandFileName, which should contain the name of the command file to be processed. Please see the PageVisualize Command Files section in the document named "Documentation-PageVisualize.rtf" for information about how to create and use command files.

The "ProcessCommandTextAsynchronously" method is similar to the "ProcessCommandFileAsynchronously" method, but instead of accepting the name of a command file as its sole parameter, it accepts the actual command text (the text that would otherwise be stored in a command file) as its sole parameter. Please see the PageVisualize Command Files section in the document named "Documentation-PageVisualize.rtf" for information about the command text syntax.

The "WaitForAsynchCompletion" method is used whenever it is necessary to wait (or block) for all of the queued and processing asynchronous page capture operations to complete. For example, if a program is created to capture a batch of page images asynchronously, afterwhich it should terminate, the program could call "CapturePageAsynchronously" several times, followed by a single call to the "WaitForAsynchCompletion" method. It would then be okay for the program to finish and exit. If the call to "WaitForAsynchCompletion" at the end of the program were to be accidentally omitted, the asynchronous page capture operations would be started but then the process would immediately terminate prior to the completion of the page captures. The asynchronous page capturing threads would be ended before having the chance to complete their work. For the reasons explained above, it is very important to call the "WaitForAsynchCompletion" method after any group of asynchronous operations is initiated. Note that it is not necessary to call the "WaitForAsynchCompletion" method after a synchronous method (such as the "CapturePageSynchronously" method) is called, since synchronous methods already wait for their own completion before returning to the caller.

API Properties Overview

The PageCaptureLibrary.dll COM object IPageCapture interface exposes the following properties:

LogFileName [out, retval] BSTR

LogFileOption [out, retval] BSTR

MaxCaptureThreads [out, retval] int

BatchTimeoutSeconds [out, retval] int

ParseWithoutProcessing [out, retval] int

CapturePageWidth [out, retval] int

CapturePageHeight [out, retval] int

CaptureTimeoutSeconds [out, retval] int

CaptureMinProgressToKeep [out, retval] int

CaptureDownloadOptions [out, retval] BSTR

ImageFilePath [out, retval] BSTR

ImageFileType [out, retval] BSTR

ImageResizePercent [out, retval] int

ImageCompressionQuality [out, retval] int

ImageGrayscaleConvert [out, retval] int

ImageFileOverwrite [out, retval] int

Each property that is of type string (BSTR) accepts or returns a string containing the option value. Each property that is of type integer (int) accepts or returns an integer containing the option value. The boolean options (options that can be either enabled or disabled) are represented as integers, and they accept or return the integer "0" to mean disabled and "1" to mean enabled. The boolean options are "ParseWithoutProcessing", "ImageGrayscaleConvert", and "ImageFileOverwrite".

For descriptions of the meaning and usage of each option property, please refer to the Options Reference section in the document named "Documentation-PageVisualize.rtf".

Sample Code

For sample code that demonstrates how to use each of the COM object API methods and properties, please refer to the "Samples" subfolder of the program installation root folder.

Support and Contact Information

For further information or to obtain support, please use the following contact information:

Website: http://www.pagevisualize.com

Email: support@pagevisualize.com