Wikinews:Water cooler/policy/archives/2014/December


Archiving policy and dead links

Is there any sort of precedent to update dead links in archived articles with links to a web archiving service, e.g., Archive.org or WebCite? If not, then I'd like to propose that we update Wikinews:Archive conventions to specifically allow for updating the link to go to an archived version of the page. Looking at Missing dog's severed head found by 17-year old girl for example, the link to the CNN source is no longer working, but Internet Archive has a link to a version archived March 25, 2007. For updating the actual {{source}} template, we would possibly have need for a new template with a parameters for the original and archived URLs and the archive date. As an example, compare

AP. "Girl finds missing dog's head in box at house" — CNN, March 15, 2007

and

AP. "Girl finds missing dog's head in box at house" — CNN, March 15, 2007. Archived from the original on 27 March 2007.

Any thoughts? Suggestions? —Mikemoral♪♫ 07:35, 7 October 2014 (UTC)[reply]

  • We've kept well-away from that previously. Not because we're keen to keep broken links, but because we've had people try to go directly to such archiving services and serve up a withdrawn version of a source for use in review.
The cited example is perfectly legitimate, unless CNN decide to dick with their robots.txt and kill off archive.org keeping copies public. Other sources have already gone down that route, which makes chasing archived copies a potentially-recurring headache.
Personally, I don't think chasing down valid links to archived copies of sources, on our already-archived articles, is of sufficient benefit to the project mission to be worth devoting significant time to. Were someone to put forward a compelling argument to do so, my suggestion would be that this is done as an additional parameter on {{source}} so we preserve the originally cited, and reviewed-from, URL. Note: Editing {{source}} should be done with extreme caution. --Brian McNeil / talk 07:46, 7 October 2014 (UTC)[reply]
It's not necessarily an important task to go around doing, but my suggestion here was to amend WN:ARCHIVE to make it explicitly allowed to add archived copies of pages. Rather than changing {{source}}, we might try something like a {{archived source}}, perhaps like the citation I used above. It would certainly be easier that trying to edit {{source}} without breaking it. —Mikemoral♪♫ 05:47, 8 October 2014 (UTC)[reply]
  • That makes sense; not a template to extend the list of sources, but to provide a "[Archived version]" link after the now-dead source. Details like which archive, and date retrieved, could be available via a tooltip on the link. That'd keep the format/layout disruption to a minimum. --Brian McNeil / talk 07:28, 8 October 2014 (UTC)[reply]
Hm, somehow missed the point about a separate template on first reading. I can see the merit. Then again, we could add it along with another field I've had in mind to add for some time, in a single edit to conserve server kitties. --Pi zero (talk) 11:33, 8 October 2014 (UTC)[reply]

┌─────────────┘
For a mention on WN:ARCHIVE, probably something in the "Post-archive edits" section among the lines of "The {{source}} template may be updated [or {{archived source}} template may be added] if a link to a source is dead to include a new link to an archived version from an archival service such as the Internet Archive or WebCite" would do. I don't think that contributes to instruction creep much, but it's an idea. Updating {{source}} or creating a new template might be the only issue here, though the later is likely easier to accomplish. —Mikemoral♪♫ 06:32, 9 October 2014 (UTC)[reply]

Creating a separate template would create an unbounded future maintenance liability, of making sure changes to one template are echoed in the other. That's undesirable.
When adding a field with significant logic, one could use the same technique I've already used with the {{source}} template's handling of author fields written in all-caps: call a subtemplate. Thereafter, one can change the behavior of that one feature by editing the template, with far less impact than an edit to {{source}} itself. (In the case of all-caps author fields, changes to that one feature now impact a template used on fewer than 1300 pages (and shrinking) rather than one used on more than 20,000 (and growing).) --Pi zero (talk) 11:18, 9 October 2014 (UTC)[reply]
Then it would seem that incorporating it into the {{source}} template is the more desirable option here, though that's rather beyond my abilities to make such a change. —Mikemoral♪♫ 09:38, 10 October 2014 (UTC)[reply]

I've been experimenting a bit with this tonight, trying to make it resist abuse. I've built in three measures, but wondering whether they're enough.

  • only works if parameter brokenURL is set. (Got this idea from an earlier experiment by brianmc; the template already supports brokenURL.)
  • requires the user to set a parameter post-review archived, the name of which reminds not to use it before review.
  • displays a message of the form [historical archive version], where the word "historical" warns reviewers not to use it during review.

The version I've set up also has a lang parameter, something we've been thinking about adding for years. My mock-up is currently at User:Pi zero/source. --Pi zero (talk) 06:28, 30 November 2014 (UTC)[reply]

@Pi zero:, @Mikemoral:, @Brian McNeil: For what it's worth (and as pointed out by Mike above), Internet Archive isn't the only archiving source--there's also WebCite. We can easily have a bot that makes WebCite citations or for that matter, we can save citations here as well. Take screenshots and download archive copies that aren't publicly viewable unless the original goes down or somesuch. —Justin (koavf)TCM 06:40, 30 November 2014 (UTC)[reply]
Lots of ways to skin the cat, which is probably why Grumpy Cat looks so grumpy.
Even when Wikinews was just getting going in 2005, there were news sources putting up their paywalls. Having a version of a news story written applying NPOV, which would never vanish behind a paywall, was one of concerns driving the setup of the project. That's why I'm not overly-worried about keeping links alive, but do appreciate some work on that will help illustrate that review is making sure our articles reflect sources accurately. --Brian McNeil / talk 11:57, 30 November 2014 (UTC)[reply]
Well, the nice thing about separating the details of handling into a subtemplate is, the subtemplate is initially not heavily used and therefore can be changed without affecting 19.5k articles. I may deploy the change at {{source}} later today, unless somebody objects. --Pi zero (talk) 12:45, 30 November 2014 (UTC)[reply]
Before deploying, though, I'll want a better name than post-review archived — I want something more idiot-resistant (we all know nothing is idiot-proof) — and I'm thinking I'll want some sort of single-page-database approach to languages as an alternative to transcluding one of a myriad templates that I've been wanting for years to replace with a more unified approach anyway. --Pi zero (talk) 13:25, 30 November 2014 (UTC)[reply]
  • Uhm, I'm not seeing this "post-review archived" in the test template/subpages. However, why can't this be done with {{PROTECTIONLEVEL:Edit|PageName}}? If it doesn't return "sysop", then the page isn't archived. --Brian McNeil / talk 21:20, 30 November 2014 (UTC)[reply]
Oh, cool. Yes that's very useful.
In tinkering, I'd changed "post-review archived" to, um, "archived after Wikinews publication". Which, yes, is awfully long. Likely to get shorter again, given a way to determine protection level
Atm I'm planning out an upgrade to a function in the lisp interpreter I've set up for succinct powerful manipulations. Once I've made the upgrade, I'll be able to reduce a table to a list of lists of strings in a single function call (highly desirable because in this context, administrative overhead of multiple function calls is very disruptive). --Pi zero (talk) 22:36, 30 November 2014 (UTC)[reply]

I've deployed changes to {{source}}. They do not (I believe and sincerely hope) change any pre-existing behavior; but archive urls and descriptions can be specified, and will show if our article is fully protected and the citation also has parameter brokenURL=true. Cf. {{source/archived}}. There's also a parameter lang=. --Pi zero (talk) 06:15, 7 December 2014 (UTC)[reply]