Pipeline to extract text (from pdf for now), section it, annotate sections (table of contents, bibliography etc) of textbooks or academic papers in a parquet file.
0.0.4
Pipeline to extract text (from pdf for now), section it, annotate sections (table of contents, bibliography etc) of textbooks or academic papers in a parquet file.