ViewVC Help
View File | Revision Log | View Changeset | Root Listing
root/Oni2/Validate External Links/Documentation/Read Me.rtf
Revision: 1157
Committed: Sun May 9 21:53:48 2021 UTC (4 years, 5 months ago) by iritscen
Content type: application/rtf
File size: 6822 byte(s)
Log Message:
ValExtLinks: Make sure that bad YT links count as NG. Various tweaks to project organization.

File Contents

# Content
1 {\rtf1\ansi\ansicpg1252\cocoartf2578
2 \cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 Helvetica-Bold;\f1\fswiss\fcharset0 Helvetica;}
3 {\colortbl;\red255\green255\blue255;}
4 {\*\expandedcolortbl;;}
5 \margl1440\margr1440\vieww12820\viewh10560\viewkind0
6 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\qc\partightenfactor0
7
8 \f0\b\fs28 \cf0 About "Validate External Links"
9 \f1\b0 \
10 developed by Iritscen ({\field{\*\fldinst{HYPERLINK "http://iritscen.oni2.net"}}{\fldrslt http://iritscen.oni2.net}})\
11 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
12 \cf0 \
13 \ul Introduction\ulnone \
14 Validate External Links ("ValExtLinks" for short) is a Bash shell script for validating large numbers of external links on a wiki. It was developed on a Mac, but hopefully any Bash shell from v3.2 onward can run it. Non-Bash shells won't work, as there is a lot of Bash-specific syntax in this script. The script invokes Unix binaries that are mostly pretty standard, but you might want to make sure that you have "curl" and "expect" installed.\
15 \
16 The purpose of this read-me is not to tell you how to use the script. Running the script with the --help option should give you that information. This read-me intends to draw your attention to the following items that you will need to adapt to your system before you can use ValExtLinks, plus information for developing and testing it.\
17 \
18 \ul Execution\ulnone \
19 The following files are in the
20 \f0\b main directory
21 \f1\b0 , in reverse-alphabetical order:\
22 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\li956\fi-957\pardirnatural\partightenfactor0
23 \cf0 \'95
24 \f0\b validate_external_links.sh
25 \f1\b0 is the main script.\
26 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\li1662\fi-1664\pardirnatural\partightenfactor0
27 \cf0 \'95 The AGENT variable should contain a reasonably up-to-date user agent string, preferably generated from Google Chrome since that is the browser used to take screenshots of pages. There are different web sites you can visit in the browser to learn the user agent string, or you can simply upload and then visit the supplied print_user_agent.php (see Development folder).\
28 \'95 WIKI_CURL and WIKI_HTTP need to be set to the location of the pages that contain "curl" error codes and HTTP response codes, respectively. See the Documentation folder for copies of these pages.\
29 \'95 WIKI_MAIN should be set to the location of the main documentation.\
30 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
31 \cf0 \'95
32 \f0\b validate_external_links.command
33 \f1\b0 is intended to be the means of running ValExtLinks. The idea is to make sure this file is executable and then to double-click it (or invoke it with "cron") when you want to run ValExtLinks.\
34 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\li1662\fi-1664\pardirnatural\partightenfactor0
35 \cf0 \'95 You can use the _LOCAL variables to supply the exact links and exception files that you want the script to process. Sample alternate invocations of ValExtLinks are also provided. Currently, the extlinks.csv file hosted on oni2.net refreshes twice a day, at 06:20 and 14:20 GMT.\
36 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\li956\fi-957\pardirnatural\partightenfactor0
37 \cf0 \'95
38 \f0\b val_expect_sftp.txt
39 \f1\b0 is an "expect" script which performs the actual upload of the ValExtLinks report. ValExtLinks was written to use SFTP because Oni2.net does not support regular FTP.\
40 \'95
41 \f0\b sftp_login.txt
42 \f1\b0 should be populated with your SFTP login info and the path that you want the report uploaded to. The ValExtLinks script uses this file when using val_expect_sftp.txt to invoke "sftp".\
43 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
44 \cf0 \
45 For your reference, the
46 \f0\b Sample files
47 \f1\b0 folder contains:\
48 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\li937\fi-938\pardirnatural\partightenfactor0
49 \cf0 \'95
50 \f0\b exceptions.txt
51 \f1\b0 is a sample of how the exceptions list should look, formatted as a MediaWiki page. Anything before "BEGIN LIST" and after "END LIST" is ignored, so a local plain-text file that only contains those keywords and a list of exceptions should work just as well.\
52 \'95
53 \f0\b extlinks.csv
54 \f1\b0 is a sample of oni2.net's external link table dump.\
55 \'95
56 \f0\b ValExtLinks report.htm/.rtf/.txt
57 \f1\b0 is a sample of the output you should get from ValExtLinks.\
58 \
59 The
60 \f0\b Documentation
61 \f1\b0 folder contains:\
62 \'95
63 \f0\b curl codes.txt
64 \f1\b0 lists the possible error codes that "curl" can return, formatted for MediaWiki.\
65 \'95
66 \f0\b HTTP codes.txt
67 \f1\b0 lists the HTTP response codes that ValExtLinks understands, formatted for MediaWiki.\
68 \'95
69 \f0\b License.txt
70 \f1\b0 is a standard copy of the MIT license, and applies to the whole project.\
71 \'95
72 \f0\b Read Me.rtf
73 \f1\b0 is this file!\
74 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
75 \cf0 \
76 \ul Development\ulnone \
77 For testing and development, the
78 \f0\b Development
79 \f1\b0 folder contains:\
80 \'95
81 \f0\b Get script line count.command
82 \f1\b0 tells you how big the script is.\
83 \'95
84 \f0\b print_user_agent.php
85 \f1\b0 can be used for setting the AGENT variable, as mentioned above.\
86 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\li950\fi-951\pardirnatural\partightenfactor0
87 \cf0 \'95
88 \f0\b Sample Archive response.txt
89 \f1\b0 is a sample of the output of the Internet Archive's snapshot availability API.\
90 \'95
91 \f0\b Sample header - OK.txt
92 \f1\b0 is a sample of what "curl" sees when it is run with the --head option and gets an OK (200) response.\
93 \'95
94 \f0\b Sample header - redirect.txt
95 \f1\b0 is a sample of what "curl" sees when it is run with the --head option and gets a redirection (302) response.\
96 \'95
97 \f0\b ValExtLinks to-do.rtf
98 \f1\b0 is the development to-do list.\
99 \'95
100 \f0\b YouTube - video_*.txt
101 \f1\b0 are files with sample page source from videos that are NG, used to teach Val how to recognize bad YT links.\
102 \'95
103 \f0\b YouTube bad link detection.rtf
104 \f1\b0 contains the links for the videos that the page source samples are from.\
105 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
106 \cf0 \
107 All right, that's all. Have fun fixing those external links!}