The wishlist feature from me if anyone builds a site is for the site to be easy do download/archive.
Something I HATE is when stories, galleries, and whole users vanish for reasons beyond my control.
I feel that if an interactives site were to be built,proviging regular DB dumps of the relevant tables (SELECT * FROM interactive_chapters), (SELECT * FROM interactives), maybe stripped of IP addresses and any other non-public stuff.
Writing a scraper gets the job done but it's not as trustworthy in terms of accuract; and likely consumes way more data-allowance than needed due to repeated serving of bulk page elements which these days can be ~250KB per page, not counting linked libraries or images associated with a page.
I'm pretty sure implimenting this would take a cron job, a shell (and maybe python) script, and about a day to make sure no horrific bugs are present. I may try writing one myself for my local databases as an example.
But I guess most people don't really care about archival and data preservation as much as I do.
Anyway, more relevant topic: Error cases
They happen and I would like advice if you have any to give on how you prefer I handle them?
When you have to parse out information from over a decade of grabs from a site that updates from time to time you get to notice their little bugs.
Things like usernames sometimes being absent. Users only being recorded in the copyright details leaving ambiguity over which (displayname/username) should be returned.
Things like 'ghost' users who have no associated copyright-box details to extract meaning I'm for the first time not actually returning a reliable value from my parser code.
(Previously, it would not spit out things like 'chapterparser_ERROR_GHOST_USER' when you asked for a username, it would just crash if it didn't understand the page.)
So if somewhere in one of my stories theres what looks like a big alarming error message, it's probably got to do with being unable to cleanly process some obsqure quirk that was fixed 5 years ago, after I saved the most recent copy of the deleted interactive.
I've got to rework my code a little so I can support resuming if something breaks then I can probably just chuck my big list of everything at it:
https://mega.nz/#!ehAigSJZ!NAYnjbXws9vN ... lZGbL_9wAwThought I should be giving regular progress updates to you interested people.
Thinking I might do batches, release those as i go, and then release a package containting everything once It's all done?
Got to tweak code a bit to allow that, ATM things are a bit too tightly integrated, si I'm going to split things apart and use a list file I can just detete large chunks of as work is done.
Here is the code vauguely as it stands right now, It's in no way well made and will need lots of preset hardcoded folders to be understood and reworked to make it run on a different machine. That's mostly due to this being a "get it working" project that took place over the better part of a decade over being a "everything nice and proper" thing that gets wrapped up with a manual within 6 months
https://mega.nz/#!XgJ0SabT!b0EmOwZxcg39 ... nRWEWXlq5wThat's about all I've got the energy to write to keep you up to date, I'm going to bed.