Problems saving discussion thread from browsers

cjmacie · January 6, 2017, 9:39am

(Background: I read/write off-line from the internet, so try to download complete discussion threads to HTML or PDF files to read off-line.)

Noticing when opening large threads (e.g. 72, 115, etc. posts), they appears OK complete in browsers (Firefox and Safari), but when doing “Save”, to PDF or HTML or PostScript, from either browser, or even trying to “Select all”, copy and paste the contents into a text document, big parts invariably turn out as blank; just the first couple of posts, or last bunch; or some then blank, some more then blank…

Testing with such save or copy/paste with the modem off (the data is re-accessed while “saving”) just the same result. I.e. the data is there in the browser, but is inhibited somehow from being wholly saved.

Any clue for this? Experienced on other platforms too? (something about WordPress data? or is this some other platform?)

(Here using iMac OS 10.8.5, 8 GB RAM, latest OS and browser updates, etc. – and earlier having survived 30+ years experience in software profession…)

Frustrating.

sujato · January 6, 2017, 11:34am

Try Chrome.

I just seamlessly saved the long vitakka/vicara thread here, as well as a 250 page pdf from a huge thread over at meta.discourse.

I also tested Firefox (on Ubuntu). It was a little odd: its own native popup blocker blocked the print dialogue! But anyway, it printed the vitakka/vicara thread just fine, although it choked on the huge thread at meta.discourse.

This issue has been discussed extensively on meta.discourse, and it has been apparently solved for Chrome and Firefox, at least. Each browser has its own quirks in how they implement the print-to-pdf feature.

The underlying problem is that Discourse is a just-in-time app, which progressively fetches data for the page load. This works well on the web, but it is not designed for static content. The developers regard this as a niche use.

cjmacie · January 9, 2017, 11:28am

Chrome – no difference.

BUT scanning the issue at the meta.discourse, found “CTRL-P”, which, with allowing Firefox to open pop-up for the particular website, appears to work – creates a full PFD rendering…

AND incorporates what looks like genuine date-time stamps, e.g.
“atipattoh 2017-01-09 09:53:20 UTC #139”

That would have been another pet-peeve to bring-up sometime…

Many thanks.

sujato · January 9, 2017, 11:30am

I’m sorry, I assumed that’s what you were doing. How else do you save to PDF?

cjmacie · January 11, 2017, 11:34am

I had been using iMac File menu → “Save Page as…” and/or “Print” → PDF…

cjmacie · January 11, 2017, 11:38am

Couple more notes on the “Print long topic to PDF” issue (meta.discourse technical thread)

After testing and writing the thank-you-it-works post last night, I noticed that the feature stopped working – ctrl-P pop-up window appeared but nothing rendered in it, and no print dialog. Checking that out back at the big meta.discourse discussion – found the specs from the person who implemented the hack:

“–It renders the no-js crawler version.
– It renders a big html with 1000 posts per page
– It’s rate limited by 10 prints / user / hour
– You can use your browser/SO/virtual printer to convert to PDF (1000 test posts is 277 pages, 6.1MB)”

The 3rd item there – I had apparently exceeded the “10 prints / user / hour”. The next day (different hour), it all works again fine. Didn’t notice mention of why this limitation was needed.

Another note – the realization that the whole SC forum edifice rests on yet another eternally “in progress” programming platform, a perfect-storm anicca situation. That sent shivers down my spine, so to speak. I will develop this a bit in the more relevant thread “What is the future of Sutta Central?”.

sujato · January 12, 2017, 1:45am

This is of course has nothing to do with the preservation of the texts on SuttaCentral, which is built on an entirely different platform.

A “discussion” is not the same thing as an ancient sacred scripture, it is about creating and nurturing a culture. We’re committed to keeping this forum alive indefinitely, but it will undoubtedly be replaced by something at some point.

Discourse was chosen because it is open source, and built by reputable developers with a strong history of maintaining and seeing projects through. They are committed to evolving the platform for the next twenty years or so, so that is probably as good as it gets.

As far as the data goes, however, there is a strict separation between data and application. The posts are kept as Markdown, so they are readable as plain text. If you want to preserve your posts, you can download them. Or you could use wget or something to download the whole site. This avoids the critical problem of Word, Adobe, and other proprietary formats that mix up the application logic with the text.

cjmacie · January 17, 2017, 4:56am

A footnote to the text-rendering / showing algorithm used here (large threads are rendered / shown piece-at-a-time, and unless using the ctrl-P feature, also “printed” to file in bits)…

A by-product of this is that the “search” function works poorly (i.e. potentially not at all) with large threads “on-line” – only the currently rendered bits get searched.

On the other hand, saving the whole thread to file (ctrl-P method), allows for text searching the whole thread in the PDF file. (works, tried it.)

This info might go into some how-to department on the SC website?

sujato · January 17, 2017, 9:38am

Search works just fine on a whole thread. I’ve just tested it on the vitakka-vicara thread with multiple terms, and it quickly found all instances, regardless of where they appear in the thread.

cjmacie · January 17, 2017, 2:21pm

Oh yes, the built-in “search” function.

I was using the Edit --> Find text search in the browser.

This “search”, turns out, is quite good. I’m more used to implementations in other forums that aren’t that handy.

samseva · January 21, 2018, 11:17am

Has anyone found a way to fix this? Or maybe another way to download threads?

I’ve been trying to download a topic of 111 posts (making sure to load the whole topic beforehand) and the result is always an HTM/HTML or PDF file with 50 or so blank pages, with maybe 20 that have text. I tried with Safari, Firefox and Chrome—both saving as a HTM/HTML/webarchive and printing as a PDF.

cjmacie · January 21, 2018, 11:58am

Been through that, or something quite similar.

The Discourse platform (that hosts the SuttaCentral forum) downloads threads of posts in buffered blocks – not the whole thread when it’s over a certain size. The browsers generally print or file-out just what’s buffered, hence the blank areas.

Solution found (from hint by V. Sujato) – (on iMac machine):
use “control-P”, which apparently evokes a function in Discourse, not the native print function of the browser, or at least does the work of fully downloading the thread and then evoking the local browser function; brings up a window with the thread contents, and a dialog for printing that to printer or to PDF file. This downloads the whole thing, which, conveniently, also attaches sequential post # to each message, as well as a genuine date&time stamp. That is to say better documentation that what usually appears in the browser.

I believe this is a function of the “discourse” platform. There does, however, appear to be a builtin limit as to how often it can be used – something like 5 or ten times in 15min or half-hour or so. Just wait then, or come back to get more.

sujato · January 22, 2018, 12:58am

What thread is this? And please let me know exactly what you are doing.

I just saved to PDF a thread of nearly 500 posts, seamlessly and swiftly, without any problem. I went over to meta.discourse to find a longer thread, and saved one of 785 posts, creating a PDF of 10MB, again, with no problems either in Chrome or Firefox.

You don’t have to do this. Just open the thread, ctrl + P, save as PDF, done.

I’m not sure what the status is of the print function discussed in the thread on meta.discourse. To me it just works like on any other page.

So it says on meta, but I just tested it by saving a long thread 30 times or so, with no issues.

cjmacie · January 22, 2018, 6:51am

Apparently, on iMac (OS 10.8…) platform, using native Firefox functions PRINT → on PDF accesses just what’s buffered from Discourse (as per “just-in-time” data delivery).

Perhaps differently on other OS platforms, in particular UNIX types. (What platform and functions was samvesa using?)

Using Control-P seems to access a distinct Discourse function, which delivers the complete thread (at least possibly up to some limit not yet encountered) to a window in Firefox/iMac OS, and then invokes the client-native function to print-to-file those contents.

Another test: after successfully accessing 5 threads via Control-P (the ones I wanted to get off-line), any further attempts (just for the purpose of testing), even with tiny threads (no more than the OP), failed – the thread window opens but with no content.

One possible (and likely) explanation: Accessing the same thread (e.g. “30 times or so”) may be accessing data already buffered (somewhere in Discourse), hence not invoking the counting-downloads function.

samseva · January 22, 2018, 8:46am

@cjmacie @sujato

Cmd/Ctrl+P works number 1—even keeps all the URLs and quote structures.

sujato · January 22, 2018, 11:53am

Just FYI, MacOS is a certified Unix system, based on BSD, but Linux is merely “Unix-like”, oddly enough.

I just checked and there is indeed a rate limiting for saving threads. Perhaps your explanation as to why I didn’t experience it is correct, or perhaps also it’s because I’m an Admin.

Anyway, I’ve increased the rate limit to 20, which I hope will be enough. I’m not really sure why there need to be a rate limit; are they afraid it will be hacked in some way, I wonder.

cjmacie · January 22, 2018, 2:19pm

Right… I’d forgotten; Jobs brought that OS with him from NeXT

Having different privileges was another possibility that came to mind.

Thanks. I don’t download more than 6 or 8 or so in any one session.

Could be they want to protect the site’s bandwidth against massive downloads attempts, which could result in slow-downs, even perhaps Denial-Of-Service type crashes.

sujato · January 23, 2018, 1:04am

That would be my guess. Anyway, we’ll see if there’s any problem.

Thanks for testing this, and please let us know if you have any more issues.

Sunyo · January 27, 2018, 1:33pm

Also still buggy for me. I now get the message {“errors”:[“You have performed this action too many times, try again later.”]}

on the /print page.

Android chrome.

sujato · January 28, 2018, 2:54am

I can only suggest you bring this up at meta.discourse.net. As I mentioned above, the developers regard offline use as a niche issue, and I would guess saving threads offline on mobile is an even narrower niche, I’m afraid.