| 1 |
|
| 2 |
This bot will make direct text replacements. |
| 3 |
|
| 4 |
It will retrieve information on which pages might need changes either from |
| 5 |
an XML dump or a text file, or only change a single page. |
| 6 |
|
| 7 |
These command line parameters can be used to specify which pages to work on: |
| 8 |
|
| 9 |
GENERATOR OPTIONS |
| 10 |
================= |
| 11 |
|
| 12 |
-cat Work on all pages which are in a specific category. |
| 13 |
Argument can also be given as "-cat:categoryname" or |
| 14 |
as "-cat:categoryname|fromtitle" (using # instead of | |
| 15 |
is also allowed in this one and the following) |
| 16 |
|
| 17 |
-catr Like -cat, but also recursively includes pages in |
| 18 |
subcategories, sub-subcategories etc. of the |
| 19 |
given category. |
| 20 |
Argument can also be given as "-catr:categoryname" or |
| 21 |
as "-catr:categoryname|fromtitle". |
| 22 |
|
| 23 |
-subcats Work on all subcategories of a specific category. |
| 24 |
Argument can also be given as "-subcats:categoryname" or |
| 25 |
as "-subcats:categoryname|fromtitle". |
| 26 |
|
| 27 |
-subcatsr Like -subcats, but also includes sub-subcategories etc. of |
| 28 |
the given category. |
| 29 |
Argument can also be given as "-subcatsr:categoryname" or |
| 30 |
as "-subcatsr:categoryname|fromtitle". |
| 31 |
|
| 32 |
-uncat Work on all pages which are not categorised. |
| 33 |
|
| 34 |
-uncatcat Work on all categories which are not categorised. |
| 35 |
|
| 36 |
-uncatfiles Work on all files which are not categorised. |
| 37 |
|
| 38 |
-file Read a list of pages to treat from the named text file. |
| 39 |
Page titles in the file may be either enclosed with |
| 40 |
[[brackets]], or be separated by new lines. |
| 41 |
Argument can also be given as "-file:filename". |
| 42 |
|
| 43 |
-filelinks Work on all pages that use a certain image/media file. |
| 44 |
Argument can also be given as "-filelinks:filename". |
| 45 |
|
| 46 |
-search Work on all pages that are found in a MediaWiki search |
| 47 |
across all namespaces. |
| 48 |
|
| 49 |
-logevents Work on articles that were on a specified Special:Log. |
| 50 |
The value may be a comma separated list of these values: |
| 51 |
|
| 52 |
logevent,username,start,end |
| 53 |
|
| 54 |
or for backward compatibility: |
| 55 |
|
| 56 |
logevent,username,total |
| 57 |
|
| 58 |
Note: 'start' is the most recent date and log events are |
| 59 |
iterated from present to past. If 'start'' is not provided, |
| 60 |
it means 'now'; if 'end' is not provided, it means 'since |
| 61 |
the beginning'. |
| 62 |
|
| 63 |
To use the default value, use an empty string. |
| 64 |
You have options for every type of logs given by the |
| 65 |
log event parameter which could be one of the following: |
| 66 |
|
| 67 |
spamblacklist, titleblacklist, gblblock, renameuser, |
| 68 |
globalauth, gblrights, gblrename, abusefilter, |
| 69 |
massmessage, thanks, usermerge, block, protect, rights, |
| 70 |
delete, upload, move, import, patrol, merge, suppress, |
| 71 |
tag, managetags, contentmodel, review, stable, |
| 72 |
timedmediahandler, newusers |
| 73 |
|
| 74 |
It uses the default number of pages 10. |
| 75 |
|
| 76 |
Examples: |
| 77 |
|
| 78 |
-logevents:move gives pages from move log (usually |
| 79 |
redirects) |
| 80 |
-logevents:delete,,20 gives 20 pages from deletion log |
| 81 |
-logevents:protect,Usr gives pages from protect log by user |
| 82 |
Usr |
| 83 |
-logevents:patrol,Usr,20 gives 20 patrolled pages by Usr |
| 84 |
-logevents:upload,,20121231,20100101 gives upload pages |
| 85 |
in the 2010s, 2011s, and 2012s |
| 86 |
-logevents:review,,20121231 gives review pages since the |
| 87 |
beginning till the 31 Dec 2012 |
| 88 |
-logevents:review,Usr,20121231 gives review pages by user |
| 89 |
Usr since the beginning till the 31 Dec 2012 |
| 90 |
|
| 91 |
In some cases it must be given as -logevents:"move,Usr,20" |
| 92 |
|
| 93 |
-interwiki Work on the given page and all equivalent pages in other |
| 94 |
languages. This can, for example, be used to fight |
| 95 |
multi-site spamming. |
| 96 |
Attention: this will cause the bot to modify |
| 97 |
pages on several wiki sites, this is not well tested, |
| 98 |
so check your edits! |
| 99 |
|
| 100 |
-links Work on all pages that are linked from a certain page. |
| 101 |
Argument can also be given as "-links:linkingpagetitle". |
| 102 |
|
| 103 |
-liverecentchanges Work on pages from the live recent changes feed. If used as |
| 104 |
-liverecentchanges:x, work on x recent changes. |
| 105 |
|
| 106 |
-imagesused Work on all images that contained on a certain page. |
| 107 |
Can also be given as "-imagesused:linkingpagetitle". |
| 108 |
|
| 109 |
-newimages Work on the most recent new images. If given as |
| 110 |
-newimages:x, will work on x newest images. |
| 111 |
|
| 112 |
-newpages Work on the most recent new pages. If given as -newpages:x, |
| 113 |
will work on x newest pages. |
| 114 |
|
| 115 |
-recentchanges Work on the pages with the most recent changes. If |
| 116 |
given as -recentchanges:x, will work on the x most recently |
| 117 |
changed pages. If given as -recentchanges:offset,duration |
| 118 |
it will work on pages changed from 'offset' minutes with |
| 119 |
'duration' minutes of timespan. rctags are supported too. |
| 120 |
The rctag must be the very first parameter part. |
| 121 |
|
| 122 |
Examples: |
| 123 |
|
| 124 |
-recentchanges:20 gives the 20 most recently changed pages |
| 125 |
-recentchanges:120,70 will give pages with 120 offset |
| 126 |
minutes and 70 minutes of timespan |
| 127 |
-recentchanges:visualeditor,10 gives the 10 most recently |
| 128 |
changed pages marked with 'visualeditor' |
| 129 |
-recentchanges:"mobile edit,60,35" will retrieve pages |
| 130 |
marked with 'mobile edit' for the given offset and timespan |
| 131 |
|
| 132 |
-unconnectedpages Work on the most recent unconnected pages to the Wikibase |
| 133 |
repository. Given as -unconnectedpages:x, will work on the |
| 134 |
x most recent unconnected pages. |
| 135 |
|
| 136 |
-ref Work on all pages that link to a certain page. |
| 137 |
Argument can also be given as "-ref:referredpagetitle". |
| 138 |
|
| 139 |
-start Specifies that the robot should go alphabetically through |
| 140 |
all pages on the home wiki, starting at the named page. |
| 141 |
Argument can also be given as "-start:pagetitle". |
| 142 |
|
| 143 |
You can also include a namespace. For example, |
| 144 |
"-start:Template:!" will make the bot work on all pages |
| 145 |
in the template namespace. |
| 146 |
|
| 147 |
default value is start:! |
| 148 |
|
| 149 |
-prefixindex Work on pages commencing with a common prefix. |
| 150 |
|
| 151 |
-transcludes Work on all pages that use a certain template. |
| 152 |
Argument can also be given as "-transcludes:Title". |
| 153 |
|
| 154 |
-unusedfiles Work on all description pages of images/media files that |
| 155 |
are not used anywhere. |
| 156 |
Argument can be given as "-unusedfiles:n" where |
| 157 |
n is the maximum number of articles to work on. |
| 158 |
|
| 159 |
-lonelypages Work on all articles that are not linked from any other |
| 160 |
article. |
| 161 |
Argument can be given as "-lonelypages:n" where |
| 162 |
n is the maximum number of articles to work on. |
| 163 |
|
| 164 |
-unwatched Work on all articles that are not watched by anyone. |
| 165 |
Argument can be given as "-unwatched:n" where |
| 166 |
n is the maximum number of articles to work on. |
| 167 |
|
| 168 |
-property:name Work on all pages with a given property name from |
| 169 |
Special:PagesWithProp. |
| 170 |
|
| 171 |
-usercontribs Work on all articles that were edited by a certain user. |
| 172 |
(Example : -usercontribs:DumZiBoT) |
| 173 |
|
| 174 |
-weblink Work on all articles that contain an external link to |
| 175 |
a given URL; may be given as "-weblink:url" |
| 176 |
|
| 177 |
-withoutinterwiki Work on all pages that don't have interlanguage links. |
| 178 |
Argument can be given as "-withoutinterwiki:n" where |
| 179 |
n is the total to fetch. |
| 180 |
|
| 181 |
-mysqlquery Takes a MySQL query string like |
| 182 |
"SELECT page_namespace, page_title FROM page |
| 183 |
WHERE page_namespace = 0" and treats |
| 184 |
the resulting pages. See |
| 185 |
https://www.mediawiki.org/wiki/Manual:Pywikibot/MySQL |
| 186 |
for more details. |
| 187 |
|
| 188 |
-sparql Takes a SPARQL SELECT query string including ?item |
| 189 |
and works on the resulting pages. |
| 190 |
|
| 191 |
-sparqlendpoint Specify SPARQL endpoint URL (optional). |
| 192 |
(Example : -sparqlendpoint:http://myserver.com/sparql) |
| 193 |
|
| 194 |
-searchitem Takes a search string and works on Wikibase pages that |
| 195 |
contain it. |
| 196 |
Argument can be given as "-searchitem:text", where text |
| 197 |
is the string to look for, or "-searchitem:lang:text", |
| 198 |
where lang is the language to search items in. |
| 199 |
|
| 200 |
-wantedpages Work on pages that are linked, but do not exist; |
| 201 |
may be given as "-wantedpages:n" where n is the maximum |
| 202 |
number of articles to work on. |
| 203 |
|
| 204 |
-wantedcategories Work on categories that are used, but do not exist; |
| 205 |
may be given as "-wantedcategories:n" where n is the |
| 206 |
maximum number of categories to work on. |
| 207 |
|
| 208 |
-wantedfiles Work on files that are used, but do not exist; |
| 209 |
may be given as "-wantedfiles:n" where n is the maximum |
| 210 |
number of files to work on. |
| 211 |
|
| 212 |
-wantedtemplates Work on templates that are used, but do not exist; |
| 213 |
may be given as "-wantedtemplates:n" where n is the |
| 214 |
maximum number of templates to work on. |
| 215 |
|
| 216 |
-random Work on random pages returned by [[Special:Random]]. |
| 217 |
Can also be given as "-random:n" where n is the number |
| 218 |
of pages to be returned. |
| 219 |
|
| 220 |
-randomredirect Work on random redirect pages returned by |
| 221 |
[[Special:RandomRedirect]]. Can also be given as |
| 222 |
"-randomredirect:n" where n is the number of pages to be |
| 223 |
returned. |
| 224 |
|
| 225 |
-google Work on all pages that are found in a Google search. |
| 226 |
You need a Google Web API license key. Note that Google |
| 227 |
doesn't give out license keys anymore. See google_key in |
| 228 |
config.py for instructions. |
| 229 |
Argument can also be given as "-google:searchstring". |
| 230 |
|
| 231 |
-page Work on a single page. Argument can also be given as |
| 232 |
"-page:pagetitle", and supplied multiple times for |
| 233 |
multiple pages. |
| 234 |
|
| 235 |
-pageid Work on a single pageid. Argument can also be given as |
| 236 |
"-pageid:pageid1,pageid2,." or |
| 237 |
"-pageid:'pageid1|pageid2|..'" |
| 238 |
and supplied multiple times for multiple pages. |
| 239 |
|
| 240 |
-linter Work on pages that contain lint errors. Extension Linter |
| 241 |
must be available on the site. |
| 242 |
-linter select all categories. |
| 243 |
-linter:high, -linter:medium or -linter:low select all |
| 244 |
categories for that prio. |
| 245 |
Single categories can be selected with commas as in |
| 246 |
-linter:cat1,cat2,cat3 |
| 247 |
|
| 248 |
Adding '/int' identifies Lint ID to start querying from: |
| 249 |
e.g. -linter:high/10000 |
| 250 |
|
| 251 |
-linter:show just shows available categories. |
| 252 |
|
| 253 |
-querypage:name Work on pages provided by a QueryPage-based special page, |
| 254 |
see https://www.mediawiki.org/wiki/API:Querypage. |
| 255 |
(tip: use -limit:n to fetch only n pages). |
| 256 |
|
| 257 |
-querypage shows special pages available. |
| 258 |
|
| 259 |
|
| 260 |
FILTER OPTIONS |
| 261 |
============== |
| 262 |
|
| 263 |
-catfilter Filter the page generator to only yield pages in the |
| 264 |
specified category. See -cat generator for argument format. |
| 265 |
|
| 266 |
-grep A regular expression that needs to match the article |
| 267 |
otherwise the page won't be returned. |
| 268 |
Multiple -grep:regexpr can be provided and the page will |
| 269 |
be returned if content is matched by any of the regexpr |
| 270 |
provided. |
| 271 |
Case insensitive regular expressions will be used and |
| 272 |
dot matches any character, including a newline. |
| 273 |
|
| 274 |
-grepnot Like -grep, but return the page only if the regular |
| 275 |
expression does not match. |
| 276 |
|
| 277 |
-intersect Work on the intersection of all the provided generators. |
| 278 |
|
| 279 |
-limit When used with any other argument -limit:n specifies a set |
| 280 |
of pages, work on no more than n pages in total. |
| 281 |
|
| 282 |
-namespaces Filter the page generator to only yield pages in the |
| 283 |
-namespace specified namespaces. Separate multiple namespace |
| 284 |
-ns numbers or names with commas. |
| 285 |
|
| 286 |
Examples: |
| 287 |
|
| 288 |
-ns:0,2,4 |
| 289 |
-ns:Help,MediaWiki |
| 290 |
|
| 291 |
You may use a preleading "not" to exclude the namespace. |
| 292 |
|
| 293 |
Examples: |
| 294 |
|
| 295 |
-ns:not:2,3 |
| 296 |
-ns:not:Help,File |
| 297 |
|
| 298 |
If used with -newpages/-random/-randomredirect/linter |
| 299 |
generators, -namespace/ns must be provided before |
| 300 |
-newpages/-random/-randomredirect/linter. |
| 301 |
If used with -recentchanges generator, efficiency is |
| 302 |
improved if -namespace is provided before -recentchanges. |
| 303 |
|
| 304 |
If used with -start generator, -namespace/ns shall contain |
| 305 |
only one value. |
| 306 |
|
| 307 |
-onlyif A claim the page needs to contain, otherwise the item won't |
| 308 |
be returned. |
| 309 |
The format is property=value,qualifier=value. Multiple (or |
| 310 |
none) qualifiers can be passed, separated by commas. |
| 311 |
|
| 312 |
Examples: |
| 313 |
|
| 314 |
P1=Q2 (property P1 must contain value Q2), |
| 315 |
P3=Q4,P5=Q6,P6=Q7 (property P3 with value Q4 and |
| 316 |
qualifiers: P5 with value Q6 and P6 with value Q7). |
| 317 |
Value can be page ID, coordinate in format: |
| 318 |
latitude,longitude[,precision] (all values are in decimal |
| 319 |
degrees), year, or plain string. |
| 320 |
The argument can be provided multiple times and the item |
| 321 |
page will be returned only if all claims are present. |
| 322 |
Argument can be also given as "-onlyif:expression". |
| 323 |
|
| 324 |
-onlyifnot A claim the page must not contain, otherwise the item won't |
| 325 |
be returned. |
| 326 |
For usage and examples, see -onlyif above. |
| 327 |
|
| 328 |
-ql Filter pages based on page quality. |
| 329 |
This is only applicable if contentmodel equals |
| 330 |
'proofread-page', otherwise has no effects. |
| 331 |
Valid values are in range 0-4. |
| 332 |
Multiple values can be comma-separated. |
| 333 |
|
| 334 |
-subpage -subpage:n filters pages to only those that have depth n |
| 335 |
i.e. a depth of 0 filters out all pages that are subpages, |
| 336 |
and a depth of 1 filters out all pages that are subpages of |
| 337 |
subpages. |
| 338 |
|
| 339 |
|
| 340 |
-titleregex A regular expression that needs to match the article title |
| 341 |
otherwise the page won't be returned. |
| 342 |
Multiple -titleregex:regexpr can be provided and the page |
| 343 |
will be returned if title is matched by any of the regexpr |
| 344 |
provided. |
| 345 |
Case insensitive regular expressions will be used and |
| 346 |
dot matches any character. |
| 347 |
|
| 348 |
-titleregexnot Like -titleregex, but return the page only if the regular |
| 349 |
expression does not match. |
| 350 |
|
| 351 |
Furthermore, the following command line parameters are supported: |
| 352 |
|
| 353 |
-mysqlquery Retrieve information from a local database mirror. |
| 354 |
If no query specified, bot searches for pages with |
| 355 |
given replacements. |
| 356 |
|
| 357 |
-xml Retrieve information from a local XML dump |
| 358 |
(pages-articles or pages-meta-current, see |
| 359 |
https://dumps.wikimedia.org). Argument can also |
| 360 |
be given as "-xml:filename". |
| 361 |
|
| 362 |
-regex Make replacements using regular expressions. If this argument |
| 363 |
isn't given, the bot will make simple text replacements. |
| 364 |
|
| 365 |
-nocase Use case insensitive regular expressions. |
| 366 |
|
| 367 |
-dotall Make the dot match any character at all, including a newline. |
| 368 |
Without this flag, '.' will match anything except a newline. |
| 369 |
|
| 370 |
-multiline '^' and '$' will now match begin and end of each line. |
| 371 |
|
| 372 |
-xmlstart (Only works with -xml) Skip all articles in the XML dump |
| 373 |
before the one specified (may also be given as |
| 374 |
-xmlstart:Article). |
| 375 |
|
| 376 |
-addcat:cat_name Adds "cat_name" category to every altered page. |
| 377 |
|
| 378 |
-excepttitle:XYZ Skip pages with titles that contain XYZ. If the -regex |
| 379 |
argument is given, XYZ will be regarded as a regular |
| 380 |
expression. |
| 381 |
|
| 382 |
-requiretitle:XYZ Only do pages with titles that contain XYZ. If the -regex |
| 383 |
argument is given, XYZ will be regarded as a regular |
| 384 |
expression. |
| 385 |
|
| 386 |
-excepttext:XYZ Skip pages which contain the text XYZ. If the -regex |
| 387 |
argument is given, XYZ will be regarded as a regular |
| 388 |
expression. |
| 389 |
|
| 390 |
-exceptinside:XYZ Skip occurrences of the to-be-replaced text which lie |
| 391 |
within XYZ. If the -regex argument is given, XYZ will be |
| 392 |
regarded as a regular expression. |
| 393 |
|
| 394 |
-exceptinsidetag:XYZ Skip occurrences of the to-be-replaced text which lie |
| 395 |
within an XYZ tag. |
| 396 |
|
| 397 |
-summary:XYZ Set the summary message text for the edit to XYZ, bypassing |
| 398 |
the predefined message texts with original and replacements |
| 399 |
inserted. Can't be used with -automaticsummary. |
| 400 |
|
| 401 |
-automaticsummary Uses an automatic summary for all replacements which don't |
| 402 |
have a summary defined. Can't be used with -summary. |
| 403 |
|
| 404 |
-sleep:123 If you use -fix you can check multiple regex at the same time |
| 405 |
in every page. This can lead to a great waste of CPU because |
| 406 |
the bot will check every regex without waiting using all the |
| 407 |
resources. This will slow it down between a regex and another |
| 408 |
in order not to waste too much CPU. |
| 409 |
|
| 410 |
-fix:XYZ Perform one of the predefined replacements tasks, which are |
| 411 |
given in the dictionary 'fixes' defined inside the files |
| 412 |
fixes.py and user-fixes.py. |
| 413 |
|
| 414 |
Currently available predefined fixes are: |
| 415 |
|
| 416 |
* HTML - Convert HTML tags to wiki syntax, and |
| 417 |
fix XHTML. |
| 418 |
* isbn - Fix badly formatted ISBNs. |
| 419 |
* syntax - Try to fix bad wiki markup. Do not run |
| 420 |
this in automatic mode, as the bot may |
| 421 |
make mistakes. |
| 422 |
* syntax-safe - Like syntax, but less risky, so you can |
| 423 |
run this in automatic mode. |
| 424 |
* case-de - fix upper/lower case errors in German |
| 425 |
* grammar-de - fix grammar and typography in German |
| 426 |
* vonbis - Ersetze Binde-/Gedankenstrich durch "bis" |
| 427 |
in German |
| 428 |
* music - Links auf Begriffsklärungen in German |
| 429 |
* datum - specific date formats in German |
| 430 |
* correct-ar - Typo corrections for Arabic Wikipedia and any |
| 431 |
Arabic wiki. |
| 432 |
* yu-tld - Fix links to .yu domains because it is |
| 433 |
disabled, see: |
| 434 |
https://lists.wikimedia.org/pipermail/wikibots-l/2009-February/000290.html |
| 435 |
* fckeditor - Try to convert FCKeditor HTML tags to wiki |
| 436 |
syntax. |
| 437 |
|
| 438 |
-manualinput Request manual replacements via the command line input even |
| 439 |
if replacements are already defined. If this option is set |
| 440 |
(or no replacements are defined via -fix or the arguments) |
| 441 |
it'll ask for additional replacements at start. |
| 442 |
|
| 443 |
-pairsfile Lines from the given file name(s) will be read as replacement |
| 444 |
arguments. i.e. a file containing lines "a" and "b", used as: |
| 445 |
|
| 446 |
python pwb.py replace -page:X -pairsfile:file c d |
| 447 |
|
| 448 |
will replace 'a' with 'b' and 'c' with 'd'. |
| 449 |
|
| 450 |
-always Don't prompt you for each replacement |
| 451 |
|
| 452 |
-recursive Recurse replacement as long as possible. Be careful, this |
| 453 |
might lead to an infinite loop. |
| 454 |
|
| 455 |
-allowoverlap When occurrences of the pattern overlap, replace all of them. |
| 456 |
Be careful, this might lead to an infinite loop. |
| 457 |
|
| 458 |
-fullsummary Use one large summary for all command line replacements. |
| 459 |
|
| 460 |
other: First argument is the old text, second argument is the new |
| 461 |
text. If the -regex argument is given, the first argument |
| 462 |
will be regarded as a regular expression, and the second |
| 463 |
argument might contain expressions like \1 or \g<name>. |
| 464 |
It is possible to introduce more than one pair of old text |
| 465 |
and replacement. |
| 466 |
|
| 467 |
Examples |
| 468 |
-------- |
| 469 |
|
| 470 |
If you want to change templates from the old syntax, e.g. {{msg:Stub}}, to the |
| 471 |
new syntax, e.g. {{Stub}}, download an XML dump file (pages-articles) from |
| 472 |
https://dumps.wikimedia.org, then use this command: |
| 473 |
|
| 474 |
python pwb.py replace -xml -regex "{{msg:(.*?)}}" "{{\1}}" |
| 475 |
|
| 476 |
If you have a dump called foobar.xml and want to fix typos in articles, e.g. |
| 477 |
Errror -> Error, use this: |
| 478 |
|
| 479 |
python pwb.py replace -xml:foobar.xml "Errror" "Error" -namespace:0 |
| 480 |
|
| 481 |
If you want to do more than one replacement at a time, use this: |
| 482 |
|
| 483 |
python pwb.py replace -xml:foobar.xml "Errror" "Error" "Faail" "Fail" \ |
| 484 |
-namespace:0 |
| 485 |
|
| 486 |
If you have a page called 'John Doe' and want to fix the format of ISBNs, use: |
| 487 |
|
| 488 |
python pwb.py replace -page:John_Doe -fix:isbn |
| 489 |
|
| 490 |
This command will change 'referer' to 'referrer', but not in pages which |
| 491 |
talk about HTTP, where the typo has become part of the standard: |
| 492 |
|
| 493 |
python pwb.py replace referer referrer -file:typos.txt -excepttext:HTTP |
| 494 |
|
| 495 |
Please type "python pwb.py replace -help | more" if you can't read |
| 496 |
the top of the help. |
| 497 |
|
| 498 |
|
| 499 |
GLOBAL OPTIONS |
| 500 |
============== |
| 501 |
For global options use -help:global or run pwb.py -help |
| 502 |
|
| 503 |
|