results dict; changed results dict to allow getting values with results.key
as well as results[key]; work around embedded illformed HTML with half
a DOCTYPE; work around malformed Content-Type header; if character encoding
is wrong, try several common ones before falling back to regexes (if this
works, bozo_exception is set to CharacterEncodingOverride); fixed character
encoding issues in BaseHTMLProcessor by tracking encoding and converting
from Unicode to raw strings before feeding data to sgmllib.SGMLParser;
convert each value in results to Unicode (if possible), even if using
regex-based parsing