diff options
| author | Kyle Javier [kj_sh604] | 2026-02-28 15:02:16 -0500 |
|---|---|---|
| committer | GitHub | 2026-02-28 15:02:16 -0500 |
| commit | 8f9756189c777074b88de39c2de1e2f7153352c2 (patch) | |
| tree | 9810f206bc9b97f1429bc7f207caf80f6ead7986 /README.md | |
| parent | fafc3e29832779b5ccbea8fd21dc9fd5af67de38 (diff) | |
| parent | 47a9736a1dfa8bfd4c5e5edd111e6ad28536066f (diff) | |
[merge] pull request #2 from kj-sh604/feat/use-dash-st-rewrite
# feat: use `feat/same-template-concat` branch `-st` implementation as main `kjandoc` binary
this pull request updates the project to improve the quality and fidelity of merged `.pptx` files, and simplifies dependencies.
the most significant changes are a rewrite of the merging approach to preserve editability and formatting, and the removal of several python dependencies that are no longer needed (as seen in the `feat/same-template-concat` that I still have up)
## enhancements to merging functionality:
* the merging process now operates directly on the ooxml/zip structure of `.pptx` files, preserving full editability and achieving near-complete fidelity to the original formatting. slide masters, layouts, themes, notes, and embedded media are all copied, and duplicate media files are deduplicated.
* a final libreoffice normalization step is used to clean up structural issues.
## dependency updates:
* removed unnecessary dependencies from `src/requirements.txt`, including `pillow`, `python-pptx`, `typing_extensions`, and `xlsxwriter`, leaving only `lxml` as a required python package.
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 23 |
1 files changed, 12 insertions, 11 deletions
@@ -10,31 +10,32 @@ https://github.com/user-attachments/assets/c7fe58c1-ff76-41bf-977b-870247a6a3e2 ## what it does - merges multiple .pptx files into one -- preserves visual formatting by rendering slides and rebuilding a new deck +- preserves full editability, with 99% fidelity to the original formatting (some minor quirks may occur) +- copies slide masters, layouts, themes, notes, and embedded media - `pandoc`-style usage: `kjandoc input1.pptx input2.pptx -o combined.pptx` ## why this exists -`pandoc` is great, but it can't concatenate `.pptx` files. +`pandoc` is great, but it can't concatenate `.pptx` files. -this uses a headless libreoffice + pdf -> png rendering to get a merge with most formatting preserved. +this works directly at the OOXML/ZIP level: it reads each `.pptx` as a ZIP archive, rewires all internal XML relationships, and writes a new near full Microsoft-compliant `.pptx`. -the tradeoff is the output slides are images (not editable shapes). +a final LibreOffice normalization pass cleans up any lingering structural quirks to prevent PowerPoint repair prompts (not guaranteed though). ## usage ```bash # pandoc-style usage ./kjandoc input1.pptx input2.pptx -o combined.pptx -# tweak quality -./kjandoc input1.pptx input2.pptx -o combined.pptx --dpi 150 +# merge more than two +./kjandoc a.pptx b.pptx c.pptx -o combined.pptx ``` ## deps - python3 -- libreoffice -- poppler (pdftoppm) -- python deps in requirements.txt +- libreoffice (for the normalization pass) +- python deps in requirements.txt (`lxml`) ## notes -- output size is larger (images) -- visuals stay intact for the most part +- output slides are fully editable +- masters and layouts from all source files are carried over +- duplicate media files are deduplicated automatically |
