Ebookmaker

Latest version: v0.12.47

Safety actively analyzes 687918 Python packages for vulnerabilities to keep your Python projects secure.

Page 8 of 24

0.12.0b2

- Changes to the cli were needed for ebookconverter integration
- `--notify` and `--validate` are now flags that turn on validation and notification
- to prevent any issues with picked jobs being sent via stdin to subprocesses, the newer subprocess api is now used to run validators and mobi generators.
- TxtWriter also creates a target directory if it doesn't exist.

0.12.0b1

- fix windows exception for unpadded date format
- add target directory creation to epubwriter
- remove gaps in playOrder for EPUB2
- don't count the size of the chunk template for chunking
- remove xml:space attributes - not allowed in EPUB or HTML5
- EpubWriter now creates a target directory if it doesn't exist, as HTMLWriter does

0.12.0b0 July 12, 2022. beta, almost for production.

- update to libgutenberg 0.10.0 - much improved logging when run from ebookconverter
- always set the lang attribute on html element
- added `--validate=(true/false)` to CommonCode so that EbookConverter can set/unset it via CLI. option can turn off validation even when a validator is installed - needed for rebuild script
- added `--notify=(true/false)` to CommonCode so that EbookConverter can set/unset it via CLI.

0.12.0a1 June 17, 2022. alpha, not for production.

- update to libgutenberg 0.9.3 - much improved logging
- fix boilerplate insertion; only replace boilerplate in the first document
- catch errors for each job in a job queue so that the rest of the queue can execute
- fixed disappearing wrapped images
- add a pyproject.toml file. Seems to get rid of the SetuptoolsDeprecationWarning
- moved code to a src directory so as to keep test code out of distributions and play more nicely with new packaging standards.

0.12.0a0 June 14, 2022. alpha, not for production.

With 0.12, Ebookmaker adds EPUB3 and MOBI(KF8) as output formats.

- This version is being tested and deployed on Python 3.8. We will continue to address any issues with Python 3.7. We no longer support Python 3.5. We have not yet tested on Python 3.9 but we expect it works without change.
- replaces Tidy with Beautiful Soup. Ebookmaker has used HTML Tidy to make sure that source files produced over the course of ~ 25 years can be parsed into a reasonably modern HTML DOM. With the advent of HTML5, Tidy has begun to show its age, and maintenance of Tidy has not kept up with the times. Bugs in Tidy are not being fixed, and we find we can no longer rely on Tidy. To replace Tidy, we are using Beautiful Soup, a very popular python package widely used for web scraping.

Tidy did some other things that caused Ebookmaker's HTML5 output poorly suited for PG,
- it reorganized style attributes into css style elements. While this made the CSS easier to manage, it resulted in less readable source code.
- it normalized whitespace in block elements. In almost all cases, this had no effect of the HTML display, many PG contributors have used this whitespace to reproduce the printed pages in the source code, making it easy to maintain.

Beautiful Soup, by contrast, only changes the source when absolutely needed to make parsable unicode HTML. We expect the resulting HTML5 files will be more pleasing for PG contributors. Some code was added to the Ebookmaker HTML parser to reproduce some of the functionality that Tidy provided.
- Beautiful soup required some minor modification in error catching for missing files
- Incoming DOCTYPE is ignored
- Tidy provided some conversion of obsolete elements/attributes into xhtml4 elements with added CSS Rules.
- `font` elements are replaced with `span`s.
- `center` elements are replace by `div`s. See note below about the CSS3 elements needed to reproduce the behavior of the `center` elements.
- when elements not permitted in as `body` content are present as a child of the `body element, they are wrapped in `div` elements
- A special formatter for Beautiful soup enforces Unicode Normal Form Composed.

- Ebookmaker has been somewhat heavy-handed when removing deprecated elements and attributes. With this version of Ebookmaker, we make more of an effort to preserve the formatting of the source document. This will impact EPUB2, EPUB3 and HTML5 produced files.
- size attributes in `font` tags are translated to css rather than ignored.
- list styles are translated to css rather than ignored.
- size and width attributes on `hr` are translated to css rather than ignored.
- width attributes on `hr` are translated to css rather than ignored.
- deprecated align attributes on most elements are translated to css rather than ignored.
- bgcolor attributes on elements other than body are translated to css rather than ignored.
- values for the attributes align, frame, and rules are changed to lower case

- a customization has been added for the cssutils module to permit us to add selected CSS properties we want to use (the built-in tables are getting old.) We needed to do this because certain conversions for obsolete elements could not be duplicated without using newer CSS properties. In particular:
- to reproduce the legacy `center` element, we added `display: flex` and `justify-content: center`.
- `speak` and `speak-as` css properties have been updated.

- for HTML5, a validation hook has been added. As with EPUB validation, add the path of your command-line HTML5 validator to the .ebookmaker config file and set the --validate flag. Tested with the W3C "Nu" validator - https://validator.github.io/validator/
- for HTML5, move colvalign to css
- for HTML5, change 3 letter language codes to 2 letter codes where available
- for HTML5, fill empty title elements
- for HTML5, improve handling of HTML4 tableframe and tablerules
- for HTML5, `article`, `section`, `header`, and `footer` are now allowed as top-level elements in `body`
- fixed crash in text file analysis when number of lines in paragraph exceeds log(max float). 700-ish
- include opentype fonts in EPUB file (.ttf. .otf, .woff), requires libgutenberg >= 0.8.14. fixes 106
- added an EPUB3 writer. In addition to producing valid EPUB3 files, some changes have been made to the produced EPUB.
- There is only an "-images" flavor. We continue to produce EPUB2 in images and no-images flavors
- Many changes in the HTML and CSS that were done for compatibility with e-readers are not done for EPUB3. The changes remain in place for EPUB2. For example:
- Floats are not removed.
- CSS absolute units are not changed.
- Uncommon characters and ligatures are not simplified.
- <q> elements are not rewritten.
- Preformatted sections are not reflowed.
- data elements are not stripped
- img class="dropcap" are not changed to spans
- any of the above that prove to be needed can be added back as needed
- all html4 -> html5 changes are made, no matter the source.
- it turns out that producers have long used workarounds to adjust for all the changes in support of limited-capability ereaders. For example, drop-caps in the HTML versions used media(handheld) and `x-ebookmaker` css rules to remove drop-caps that didn't work in EPUB. Now that we are no longer removing floats and the like, we had hoped to undo most of these accommodations for EPUB3. This proved to be too complex. `media(handheld)` rules are now replaced by `media (max-width: 480px)` for EPUB3, and `x-ebookmaker is supplemented by `x-ebookmaker-2` for EPUB2 files and `x-ebookmaker-3` for EPUB3 files. Going forward, producers should try to avoid, as much as possible, using the `x-ebookmaker-3` class and instead use media queries so that customizations will also benefit small-screen users of the html files.
- For EPUB3, we still need to remove CSS rules that use the position property. Apple iBooks only allows the position property for fixed-layout EPUBs; for reflowable EPUBS, it appears to remove any elements that use `position: absolute`. It looks like absolute positioning is used mostly for page number anchors in the PG corpus, so we are retaining the behavior of hiding page number anchors when they use absolute positioning. Producers who want visible page number anchors should use floating elements.
- For EPUB3, in our initial testing, we found that setting a default body margin hurt more books than it helped, and we are now using different default CSS sheets for EPUB3 and EPUB2.
- CSS for the EPUB cover is has been updated to better handle small or oddly sized cover images.
- Ebookmaker breaks HTML source into chunks to improve performance on EPUB readers. For EPUB3 files, the chunker treats `section` elements the same way it treated `div.section` elements for EPUB2. Similarly section elements in HTML5 source are converted to div.section elements for EPUB2. In addition, the maximum chunk size for EPUB3 is 300KB compared to 100KB for EPUB2.
- Ebookmaker now supports attributes in the epub namespace {http://www.idpf.org/2007/ops} These can be entered in source file in two ways:
- any `data-epub-*` attribute in an html or xhtml source file is moved to the epub namespace for EPUB2, stripped for EPUB2, and preserved as-is for HTML5. This option will allow permit validation with the W3C 'nu' validator.
- any 'epub:*' attribute in a properly namespaced XHTML file will be preserved for EPUB3, stripped for EPUB2, and converted to a `data-epub-*` attribute in HTML5.
- This version expands support for accessibility attributes.
- the epub:role attribute (see above for using the epub namespace)
- HTML5 attributes `role`, `aria-label` and `aria-labelledby` help screen readers interpret HTML. see https://idpf.github.io/epub-guides/epub-aria-authoring/ for guidance about how to use these. Ebookmaker will strip these attributes for EPUB2 files.
- obsolete values of the `speak` CSS property are now updated to current CSS2/3 equivalents.
- as discussed above, `speak` and `speak-as` css properties are now included in EPUB, EPUB3 and HTML5 files.
- tibetan (bo) added to list of languages for mobi conversion by calibre
- fixed issue where backlinks required an id set on the original element
- HTML5 `wbr` tags (line break opportunity) are removed for EPUB2
- HTML5 and EPUB3 files no longer duplicate the lang attribute in xml:lang
- Ebookmaker is phasing out the use of Kindlegen, which has been unsupported for a while by Amazon. While kindlegen can still be specified as the converter app in the config file, Calibre is now the default conversion app. the generated EPUB2 file is used as the source for MOBI (version 6) files, while EPUB3 files are used as the source for MOBI (KF8 format) files.
- fixed bug where dangling references were created by `x-ebookmaker-drop`
- for EPUB2, added the required summary attribute on table elements.
- for EPUB2, when an x-ebookmaker-page element is added, a `div` is made instead of an `a` when the element is a direct child of `body`
- for EPUB2 and EPUB3, when an x-ebookmaker-drop element containing an `id` is removed, a `div` is added instead of an `span` when the element was a direct child of `body`.
- for EPUB, fixed bug for irregular heading hierarchies
- work around bug in lxml >= 4.7 causing parse failures for rst conversions
- restored newlines in validation logging to make vaidation issues readable
- for conversions from RST: removed invalid 'classes' attribute
- for conversions from RST: added pg_boilerplate to generated headers
- for conversions from RST: stop printing the encoding as metadata
- for EPUB2 and EPUB3: Ebookmaker no longer makes an invalid reference when 'mailto:' links are present
- for EPUB2 and EPUB3: Adds a MIN_CHUNK_SIZE to avoid empty chunks when `body` begins with a section.
- when HTML or TXT source files are parsed, we attempt to identify Project Gutenberg "Boilerplate". When detected, these sections are wrapped in `section` tags for HTML and `pre` for TXT, with appropriate ids. three types of boilerplate identified are:
pg_header
usually a title and license declaration
sometimes, title, book number, release date, authors, language, encoding, credits
when detected, metadata will be parsed and enclosed in a pg_metadata_raw sub-section
pg_footer
usually the trademark license
pg_smallprint
on older books, this will contain license-ish language and other material. it's usually
found at the top of the text, and is often comically dated.
- for HTML5 and EPUB3. replace old boilerplate with up-to-date, generated Boilerplate!!!

0.11.30

- for EPUB, down-convert HTML5 tags to divs so the files validate as EPUB2. The new div elements will add a class named the same as the html5 tag, so `<section>` becomes `<div class="section">`. Other attributes are preserved. In addition CSS selectors involving these elements will be transformed accordingly: for example `section` becomes `div.section`
- `section`
- `figure` (initial style set to "margin: 1em 40px;", copying from Firefox internal stylesheet.)
- `figcaption`
- `header`
- `footer`
Users of these HTML5-only tags need to check that their CSS does not conflict with the added classes or changed CSS. In almost all cases, avoiding HTML5 element names for CSS classes will prevent any conflict. Users of HTML5 input may still encounter unresolved issues with other parts of the DP/PG tool chain; please examine output files carefully for unexpected behavior.
- for EPUB, move 'tfoot' elements to before 'tbody' (the order used in HTML4)
- for EPUB, remove any 'meta' elements using the 'property' attribute.
- add 'CRITICAL' notification for 'too-deep' errors
- reset parsers after txt jobs. fixes a bug when the plain text source file is linked from the html.
- EPUBCheck validation was broken. To use EPUBCheck validation, first download and install EPUBCheck from https://www.w3.org/publishing/epubcheck/. If the command to invoke it is `java -jar /Applications/epubcheck-4.2.6/epubcheck.jar`, then add this line to ~/.ebookmaker or /etc/ebookmaker.conf: `epub_validator: java -jar /Applications/epubcheck-4.2.6/epubcheck.jar` then turn on validation by adding `--validate` to Ebookmaker's command line invocation or by setting validate to true in ~/.ebookmaker

0.11.29

- for HTML5, remove Content-Language metas
- when converting a presentational attribute to css in a style attribute, put the added css *before* existing content of the style, so as not to override it. this mimics browser behavior for cases when the two styles conflict. This won't do much good right away because tidy strips the styles into named classes.
- stop adding a viewport meta tag. it turns out this interferes with good HTML5 designs for mobile.

0.11.28

- fix 100. the behavior of --output-file has changed. a string passed using this argument is used to name the file where the Gutenberg ID would be. Previously it would be just the name of the output file, no matter the file type, except for Kindle, PDF and TeX. File naming for kindle was broken completely. In the past (version <0.11) --title would override the parsed or looked-up title. Title would be used in the file name if there was no Gutenberg id, or --outputfile.
- docutils rst conversion introduced a typo in 0.18 resulting in some css problems
- added exception handling in ImageParser for broken images
- don't select cover until it's needed. Ebookmaker has been generating unneeded covers in the txt step because it hasn't parsed an html file.
- for HTML5, fixed a css syntax error in the css added for the tablecols attribute
- for HTML5, make sure lang and xml:lang attributes are in sync; put invalid langs in data-invalid-lang attribute.
- for HTML5, remove height or width attributes that are 0 or empty

0.11.27

- one more fix for docutils 0.18+

Page 8 of 24

Releases

Has known vulnerabilities

Previous Next

Ebookmaker

Page 8 of 24

0.12.0b2

0.12.0b1

0.11.30

0.11.29

0.11.28

0.11.27

Page 8 of 24

Links

Releases