A small, self-hosted HTTP API that turns HTML, URLs and JSON into PDF/A-3a
2 points, 0 comments on Hacker News
3 headlines
2 points, 0 comments on Hacker News
The Technical Problem: Websites Drift, Pipelines Don't Know Long-running scraping pipelines have a structural assumption baked in: the URLs you configured last month still resolve today. That assumption is wrong more often than you'd expect. Sites reorganize their URL structure during CMS migra...
2 points, 0 comments on Hacker News