extractKinja - Another backup solution [UPDATE : Initial batch import support]

•

[UPDATE on the 13th] It is now possible to batch import the articles listed on one page such as the mainpage of a blog or the “Posts” page of an author.

I made a tool to extract articles from Kinja blogs and only keep the content part of the article (no header/footer/comments/”You might also like”) while saving/replacing any external content that would be fetched from Kinja.

No Javascript from Kinja has been reused, i did very minimal code to be able to display the tags menu and open images on a new tab by clicking on them ; the Youtube, Twitter, Vimeo, Dailymotion, Imgur and Instagram widgets/embedding from Kinja have been replaced by “standard” ones.

It’s saving the article(s) on the web server along with a copy of the images and videos (including the author avatar, blog favicon and thumbnail used on the main page) ; for the images, only the highest resolution is kept.

It is still in beta (please let me know if you find some issue or have ideas), the next step is to do better batch import and if possible recreate the equivalent of the main page : listing posts with the photo, title and name of the author.

How to use - To backup one article

Copy/paste either the ID of the article (eg: 1845644279 for this page) or the full URL (including https://) of the article you want archived at the end of this URL : http://jbboin.phpnet.org/oppo/extractor/extractKinja.php?article= (for example : http://jbboin.phpnet.org/oppo/extractor/extractKinja.php?article=https://oppositelock.kinja.com/a-general-handbook-for-posting-on-oppositelock-1293992803)

How to use - To backup articles listed on a page

Copy/paste the full URL (including https://) of the page you want archived at the end of this URL (you can set update=1 if you want the script to re-fetch the articles already in the archve, if their content have been modified for example) : http://jbboin.phpnet.org/oppo/extractor/extractKinja.php?update=0&page= (for example : http://jbboin.phpnet.org/oppo/extractor/extractKinja.php?update=0&page=https://oppositelock.kinja.com/?startTime=16052546040000)
The operation can take more than a minute

What has already been extracted is browsable here (it’s simply a DirectoryIndex at the moment)

Known bugs at the moment

Images galleries are not working but the images/videos are saved anyway (you can access to all the files of the article by removing “article.html” from the URL)
The comments are not integrated on the post, it’s not a bug, it’s a feature (for the time being at least) but they are saved in the articleMetadatas.json
~~Poster avatar can be stretched in some case : At the moment it’s saving the highest resolution available for this image which might not be the one normally used by Kinja~~ FIXED
Tweets are at the moment fixed in height which crops big ones (with video for example) ; Instagram posts have the same issue
~~I haven’t worked on the embedded Instagram posts (as i haven’t found one) so it will still use the Kinja widget for the time being~~ FIXED
~~Vimeo embedding is not working (here~~ ~~for example)~~ FIXED
Links on the article to other articles are not modified so they won’t be working anymore once the Kinjapocalypse happened
Instgram posts are (sometimes?) looking a bit... not normal

Source code here, if someone is interested.

ps: i did initially put the wrong tool name on the post title... KinjaExtractor instead of extractKinja, sorry for the confusion :(