--- changes per libwhisker release ------------------------------------- [] libwhisker 2.2 - Sullo pointed out that the api_demo.pl script uses save_ssl_info, rather than ssl_save_info. As it turns out, api_demo.pl was horribly out of date. It's been given an overhaul using the new libwhisker 2.x programming semantics. - Sullo also pointed out that ssl_save_info wasn't working. This was because _http_do_request_ex() was erasing the ssl data previously set by http_do_request(). - Added 'use_referrers' crawl() config option, which is enabled by default. This causes the crawler to send appropriate HTTP referrers for all crawled links. - Changed some internals of crawl() to keep the url_queue inside the crawl object, which makes it accessible to the source callback function. - Added 'install_lw1' option to Makefile to automatically install the LW(1) compatibility bridge file (which emulates Libwhisker 1.x, letting Libwhisker version 1.x programs use the Libwhisker 2.x library transparently). - Added cookie_new_jar() function, to handling creating new jars (which are still hashes at this point, but you should still use the function). - More documentation updates. - http_fixup_request() now forces the correct POST data length, and does some extra checks to make sure leftover POST headers from previous requests are dealt with. - Added utils_delete_lowercase_key() function for deleting hash keys without worrying about the capitalization. - Small change to http_read_headers() to reset the match position pointer. - sgt_b made me realize the need for a utils_find_key() function, which is similar to utils_find_lowercase_key() but is case-sensitive. Sure, a quick hash lookup would find the key too, but utils_find_key() has the added bonus of dereferencing anonymous arrays of multiple header values, if encountered. - Dave King's query on handling form data returned from forms_read() led me to creating a forms_walkthrough.txt document and the form_demo.pl scripts. - Found a bug in the _forms_parse_callback() (used internally by forms_read()) which caused it to mishandle textareas. The html_find_tags() function already does the work of finding the closing tag, so _forms_parse_callback() doesn't need to do it. The HTML parser must do it this way, since tags within a are not to be parsed. - Bug in uri_parse_parameters(), which took a shortcut exit if a '&' wasn't found. The shortcut should actually be triggered on a '='. - Bug in html_find_tags() which caused non-value attributes to be saved with a empty string value. Now non-value attributes are saved with an undef value. - For some reason an extra empty element was being introduced into hash during forms_write(). I've traced it and can't figure out where it's coming from. In the meantime, I've added a check to make sure any empty elements don't show up in the output. - Mathieu Dessus pointed out a bug where the older anti_ids() function was still being called, rather than the renamed encode_anti_ids(). You still use {whisker}->{anti_ids} to set the values though... ---------------------------------------------------------------------------- [] libwhisker 2.1 - Sullo pointed out that $LW_HAS_SSL has disappeared. Forgot to document that. Use $LW_SSL_LIB instead. - Changed a (len!=0) to (len>0) check in the chunk decoder, to be more robust. - added html_link_extractor() function, which uses code already present in the crawl module. - The regex was a bit broken in encode_uri_randomhex(). Pointed out by John McDonald. - John also found a typo in encode_anti_ids(), causing it to call the non-existant function encode_randomase(). - New Makefile.pl build environment. - Bug in forms_read() and _forms_callback() which prevented the proper storage of multiple forms. ---------------------------------------------------------------------------- [] libwhisker 2.0 - Libwhisker 2.0 is officially dubbed LW2. Below are the incompatible changes from libwhisker 1.x. There were lots of general changes, but only the non-backwards-compatible ones are documented. - Following were renamed: {whisker}->{req_spacer*} => {whisker}->{http_space*} {whisker}->{http_ver} => {whisker}->{version} {whisker}->{http_protocol} => {whisker]->{protocol} {whisker}->{uri_param} => {whisker}->{parameters} {whisker}->{recv_header_order} => {whisker}->{header_order} {whisker}->{http_resp_message} => {whisker}->{message} {whisker}->{INITIAL_MAGIC} => {whisker}->{MAGIC} {whisker}->{sockstate} => {whisker}->{socket_state} utils_lowercase_(hashkeys|headers) => utils_lowercase_keys utils_split_uri => uri_split utils_join_uri => uri_join utils_normalize_uri => uri_normalize utils_absolute_uri => uri_absolute utils_get_dir => uri_get_dir utils_unidecode_uri => decode_unicode anti_ids => encode_anti_ids bruteurl => utils_bruteurl auth_set_header => auth_set encode_str2uri => encode_uri_hex encode_str2ruri => encode_uri_randomhex dumper => dump dumper_writefile => dump_writefile - Following are now depreciated (along with their functionality): {whisker}->{method_postfix} {whisker}->{http_req_trailer} {whisker}->{queue_md5} (use {request_fingerprint}) {whisker}->{http_resp} (use {code}) {whisker}->{retry_errors} {whisker}->{ids_session_splice} do_auth (use auth_set) upload_file download_file (use get_page_to_file) md5_perl (use md5) md4_perl (use md4) (en|de)code_base64_perl (use (en|de)code_base64) crawl_get_config crawl_set_config - {whisker}->{parameters} will not be included if it's an empty string - {whisker}->{normalize_incoming_headers} now changes AA-Bb-cc-dD to Aa-Bb-Cc-Dd, instead of the prior AA-Bb-Cc-DD. - Invalid HTTP response error message does not include invalid response (but it's still in {whisker}->{data}) - IDS session splicing is depreciated. Most IDSes do stream reassembly anyways, so this is not a big loss. The depreciation is due to limitations of the current stream implementation. It will reappear in future versions. - cookie_* now operates independantly of the actual set-cookie header. http_do_request now has internal magic, so that all cookies are saved and processed regardless of header capitalization, normalization, and duplication (including the default ignore_duplicate_headers). - Lots of the global variables were changed/renamed or removed. See globals.pl for details. - Crawl was completely rebuilt to be more object-ish (the use of so many global variables made it hard to have multiple crawl sessions going at once). If you were using crawl(), then you will need to review the new way of calling crawl() and accessing related data. All the crawl data structures (and locations) were changed, as were the format for configuring the crawler and callbacks. - Dumper() returns undef on error, instead of the string 'ERROR'. - html_find_tags() takes a few more optional parameters. Using a tag map can lead to speed increases by reducing the amount of times the callback function is actually called. - The libwhisker 1.x series did not properly generate forms structures (via forms_read()). It was corrected, but the generated structure, while now accurate per documention, is not backwards-compatible. - Authorization is now handled via auth_set(), and not merely by the presence of the Authorization header. Also, the internal {whisker}->{ntlm_*} keys relating to NTLM authentication have been deleted. You shouldn't have been using them anyway. :) - Socket timeout values are read from {whisker}->{timeout}, and are saved per stream. The global $TIMEOUT variable no longer exists. - HTML rewriting via html_find_tags() is now done by calling html_find_tags_rewrite() within our callback function. The return value of the callback is ignored (and thus not required, unlike LW1.x). - auth_set() will now call http_reset() whenever any NTLM-based authentication is used. This is because NTLM is a connection-based authentication, and thus all connections need to start from scratch when NTLM is enabled. - The ETag header is now normalized to ETag, and not Etag. - All new POD documentation, which follows the more standard format for use with pod2man. - utils_find_lowercase_keys() will now dereference multi-value entries and return a full array if it is called in array context. - A bug in Crypt::SSLeay (Net::SSL) 0.51 (and probably prior) causes it to puke when it is used in proxy mode. Hopefully it will be fixed in future versions. - Turns out the Net::SSLeay implementation of MD5 was returning bad hashes (it truncated them at the first NULL byte). Use of Net::SSLeay::md5 has been discontinued permanently.