I have a script that will download each page of a thread (from ST at the moment) and saves the html in a txt file
I also have a script that will make a simple HTML table of all the posts in a downloaded file.
I am offering, to anyone, for free, this service.
Just post here with a link to the first page of the thread, the number of pages in said thread and (if aplicable) permission from the author of the thread.
I have already done David's (dedgren) CJ, Three Rivers Region and the results will be appearing in the 3RR-ST board in the 3RR section of this site :)
Joe
Ah, so that's what David is doing in those new ST 3RR threads! ::)
Would you be able to do this for my CJ at Simtropolis? (The Greater Terran Region). Although I am now updating it here, the MD is missing dozens and dozens of updates that I just don't have time to bring over. If you would be able to save it as a html file, that would be wonderful!
You can find the start of the thread here:
http://www.simtropolis.com/forum/messageview.cfm?catid=36&threadid=95625&enterthread=y (http://www.simtropolis.com/forum/messageview.cfm?catid=36&threadid=95625&enterthread=y)
-Since I am the author, I grant you premission :P
I think another good one to do would be the RHW thread at ST. However, the author (qurlix) has not been active for some time so I don't know how you can get authorization. The link is here:
http://www.simtropolis.com/forum/messageview.cfm?catid=124&threadid=67624&enterthread=y (http://www.simtropolis.com/forum/messageview.cfm?catid=124&threadid=67624&enterthread=y)
Good luck!
Best,
-Haljackey
Edit: You're right, I did forget the number of pages. It is currently 31 pages long.
You forgot a vital part... the number of pages :D, but as its my first i will forgive you $%Grinno$%
I will leave the RHW thread until there is some "higher" authentication...but your CJ is coming right as I type this :)
... Done, in less than 2 mins :)
now I just need to get the parser to write to files :)
Joe
Edit: WOOO my parser works too :)
Hey Hal, thanks for being my test subject ::) $%Grinno$%
I got you that backup you wanted, dont know how to get it you tho
and as a bonus for being my first customer (and a request by CasperVg ;)) I got your Show Us Your Interchanges thread as well :) Downloaded and parsed in under 2 mins :o
Joe
Quote from: JoeST on August 06, 2008, 02:57:32 PM
and as a bonus for being my first customer (and a request by CasperVg ;)) I got your Show Us Your Interchanges thread as well :)
:angrymore:
What? Without my permission? :bomb: I am the author of that thread! %bur2$
Nah, thanks! :P Glad to see that you are having luck with it.
You don't know how to get it to me??? $%Grinno$% Just take your time with it. ;)
Great work! :thumbsup:
Best,
-Haljackey
while i am at it, do you want your Multi-RHW Guide? :D
and yeah i got yours, but only for personal use... :P
so, like email you it?
/me waits for a customer :)
Joe
* A customer arrives *
You probably never heard of backing up these threads before this post ;D
You know you could backup the Discovery threads in the Modding - R&D board.
All 2 pages long except the RUL and SC4Paths which has 6.
Jonathan
Hey Jon
guess what.... I was just looking at those threads :D
I will get you then asap.
Joe
I know this thread's kinda dead...
but another customer is here if you're still open. There are several private threads on Simtropolis that I would like to have an html copy of, if you could. I can give you links, pages, etc., but you'll need to be added to the member list first, which I can't do, as I didn't start the threads, though I have permission to back them up.
Tell me if you're willing to do it and I'll give you links.
Thanks,
DTP
yeah I will back them up, I just have to find the script and get it up and running, might be a few days before i am ready :)
joe
OK, thanks! That's fine. I'll have you added to the member lists and PM you the links then.
EDIT: Joe, what's your username on ST? I can't seem to find you...
im star.torturer hehe
Found you. :D
Links:
http://www.simtropolis.com/forum/messageview.cfm?catid=1&threadid=102607&enterthread=y (2 Pages)
http://www.simtropolis.com/forum/messageview.cfm?catid=1&threadid=103100&enterthread=y (3 Pages)
(You should be added to the member lists already)
oh, just realised, i wont beable to auto back up... due to my script acting as a guest...
if you want the parsing thing, then i would suggest saving them individualy as full html and i can then run them through the parsing script as a batch ;)
Right, I am back and ready for business.
I have a bash script that downloads each page of a topic from ST, and a py script that splits them (roughly) into posts. I can skim for particular postser's contributions too. It currently backs up the whole thread as pages (to `./pages/page_#`), and then can split each page into posts (`./posts/post_#-$AUTH` in html not just the post content) and can filter out only those by a specific author.
Any particular threads anyone wants?
Joe