User:Amgine/Google News Sitemap
Google News Sitemap is an extension special page designed to provide an xml feed from a Mediawiki website, using categories, notcategories, and namespace as primary selection criteria. It was originally developed to provide en.Wikinews a sitemap feed for Google News, so its content may be distributed.
XML feed formats
editSitemap
edit- The Sitemap schema is a very basic listing of urls, with optional last modification and priority elements. The Google News extension includes the publication date and optional keywords elements. Currently all required elements, last modification, and keywords are supports; priority schema ideas are being actively solicited. See planned improvements.
RSS
edit- An RSS 2.0 compliant feed. The page uses the Mediawiki classes to provide the feed, which is very robust.
Atom
edit- An RFC 4207 (December 2005) compliant Atom feed. The page uses the Mediawiki classes to provide the feed, which is very robust.
URL parameters
editParameters are provided in the url:
http://domain/wiki/Special:SpecialGNSM/[parameter=value][¶meter=value][...]
Parameters determine which articles will be found/returned, the order in which they are sorted, and which feed format will be used.
category
edithttp://domain/wiki/Special:SpecialGNSM/category=Published
Selects only articles which are members of the category value. Up to six (configurable) categories and notcategories may be provided; current behavior is to ignore >6 categories or notcategories. For multi-word categories replace spaces with _, eg category=Science_and_Technology.
Options: string value
Default value = Published
count
edithttp://domain/wiki/Special:SpecialGNSM/count=10
Returns no more than the count value articles. Note the configurable maximum value may not be exceeded, and the count may not be less than the configurable minimum.
Options: integer value
Default value = 50 (maximum)
days
edithttp://domain/wiki/Special:SpecialGNSM/days=7[...]
Limit the feed to articles added to the category in the past X days (in seconds). Only available for Sitemap feeds at the moment.
Options: integer
Default value = 3
feed
edithttp://domain/wiki/Special:SpecialGNSM/feed=[rss/atom/sitemap][...]
Produces different standard formats of feed.
Options: sitemap||atom||rss
Default value = atom.
namespace
edithttp://domain/wiki/Special:SpecialGNSM/namespace=String_value
http://domain/wiki/Special:SpecialGNSM/namespace=3
Selects only articles which are in the named/number namespace. If this parameter is present more than one time, only the last will be used.
Options: integer value || string value
Default value = null
notcategory
edithttp://domain/wiki/Special:SpecialGNSM/notcategory=Unpublished
Selects only articles which are not members of the notcategory value. Up to six (configurable) notcategories and categories may be provided; current behavior is to ignore >6 notcategories and categories. For multi-word notcategories replace spaces with _, eg notcategory=Science_and_Technology.
Options: string value
Default value = null
order
edithttp://domain/wiki/Special:SpecialGNSM/order=[descending/ascending]
Sorts returns in either ascending or descending order, based on the ordermethod.
Options: ascending || descending
Default value: descending
ordermethod
edithttp://domain/wiki/Special:SpecialGNSM/ordermethod=[lastedit/qualitypages/categoryadd]
Returns the found articles sorted by when they were last edited, the qualitypage rating (using Flagged Revisions) or by the timestamp when they were first added to the first (or default) category.
Options: lastedit || qualitypages || categoryadd
Default value = categoryadd
qualitypages
edithttp://domain/wiki/Special:SpecialGNSM/qualitypages=[only/include/exclude]
If the extension Flagged Revisions is installed, will exclude, return only, or ignore whether an article's quality rating is >1.
Options: include || only || exclude
Default value: null
redirects
edithttp://domain/wiki/Special:SpecialGNSM/redirects=[exclude/include/only]
Excludes, return only, or ignores whether an article is a redirect.
Options: include || only || exclude
Default value: exclude
stablepages
edithttp://domain/wiki/Special:SpecialGNSM/stablepages=[only/exclude/include]
If the extension Flagged Revisions is installed, will exclude, return only, or ignore whether an article has an article's stable revision.
Options: include || only || exclude
Default value: only
Planned improvements
editAKA ToDo List.
Determine last modification date (Sitemap feed)31 Oct 2009Determine keywords from all category memberships (Sitemap feed)31 Oct 2009- Filter category members
- - dates
- - Published
- Filter category members
- Develop priority criteria and implement (Sitemap feed)
- Age of article?
- Additional ordermethod?
Develop qualitypages as an ordermethod02 Nov 2009- Develop curid urls
Remove useNamespace DPL cruft- Graceful error fails
When category is empty, close root xml element. (rprtd 2 Nov 2009 Bawolff)tentatively mrkd fixed 2 Nov 2009Graceful fail when category param present but empty (clean up debug code) (rprtd 2 Nov 2009 Bawolff)tentatively mrkd fixed 2 Nov 2009Both the above errors probably also apply to notcategorytentatively mrkd fixed 2 Nov 2009
Add GN bool param to limit ts > ( ts_now - 3 days ) [high priority]01 Nov 2009Make it configurable.01 Nov 2009Make it actually work.02 Nov 2009
Default author = $wgSitename02 Nov 2009- Normalize feedSMItem to feedItem parameters, so $wgFeedClasses can be implemented. 02 Nov 2009
- Not sure this is worth the effort as the feeds are quite different.