| 1 |
15 |
art |
|
| 2 |
|
|
<html>
|
| 3 |
|
|
<head>
|
| 4 |
|
|
<title>forgottenislanderbot -- Documentation</title>
|
| 5 |
|
|
<style type="text/css">
|
| 6 |
|
|
body {font-family: Helvetica, Ariel, sans, sans-serif;}
|
| 7 |
|
|
.toc li {list-style:none;}
|
| 8 |
|
|
div.cmd {font-family:Courier New, Courior, Monospaced, Monospace; font-size: 14pt;text-indent:2em;background-color:#BBBBBB;}
|
| 9 |
|
|
.attn {background-color:#ff0000;}
|
| 10 |
|
|
.footer {color:#999999;text-align:center;}
|
| 11 |
|
|
</style>
|
| 12 |
|
|
</head>
|
| 13 |
|
|
<body>
|
| 14 |
|
|
<a name="0" />
|
| 15 |
|
|
<h1>forgottenislanderbot -- Documentation</h1>
|
| 16 |
|
|
<p>
|
| 17 |
|
|
forgottenislanderbot is a set of computer programs for
|
| 18 |
|
|
gathering data on Bell Aliant's <abbr title="Digital Subscriber Loop">DSL</abbr> high-speed internet
|
| 19 |
|
|
coverage on PEI by querying Bell Aliant's website, and
|
| 20 |
|
|
displayed the data on a virtual globe, like Google
|
| 21 |
|
|
Earth.
|
| 22 |
|
|
</p>
|
| 23 |
|
|
<hr />
|
| 24 |
|
|
<a name="1" />
|
| 25 |
|
|
<h2>1. Table of Contents</h2>
|
| 26 |
|
|
<div class="toc">
|
| 27 |
|
|
<ul>
|
| 28 |
|
|
<li>
|
| 29 |
|
|
<a href="#1">1. Table of contents</a>
|
| 30 |
|
|
</li>
|
| 31 |
|
|
<li>
|
| 32 |
|
|
<a href="#2">2. Overview</a>
|
| 33 |
|
|
</li>
|
| 34 |
|
|
<li>
|
| 35 |
|
|
<a href="#3">3. Installation</a>
|
| 36 |
|
|
</li>
|
| 37 |
|
|
<li>
|
| 38 |
|
|
<a href="#4">4. Use</a>
|
| 39 |
|
|
<ol>
|
| 40 |
|
|
<li>
|
| 41 |
|
|
<a href="#4.1">4.1. Using the command-line</a>
|
| 42 |
|
|
</li>
|
| 43 |
|
|
<li>
|
| 44 |
|
|
<a href="#4.2">4.2. Getting the civic-address database</a>
|
| 45 |
|
|
<ol>
|
| 46 |
|
|
<li>
|
| 47 |
|
|
<a href="#4.2.1">4.2.1. Downloading the raw data</a>
|
| 48 |
|
|
</li>
|
| 49 |
|
|
<li>
|
| 50 |
|
|
<a href="#4.2.2">4.2.2. Converting to a <abbr title="ForgottenIslanderBot">FIB</abbr> database</a>
|
| 51 |
|
|
</li>
|
| 52 |
|
|
</ol>
|
| 53 |
|
|
</li>
|
| 54 |
|
|
<li>
|
| 55 |
|
|
<a href="#4.3">4.3. Crawling for <abbr title="Digital Subscriber Loop">DSL</abbr> coverage</a>
|
| 56 |
|
|
</li>
|
| 57 |
|
|
<ol>
|
| 58 |
|
|
<li>
|
| 59 |
|
|
<a href="#4.3.1">4.3.1. Robot etiquette</a>
|
| 60 |
|
|
</li>
|
| 61 |
|
|
<li>
|
| 62 |
|
|
<a href="#4.3.2">4.3.2. Planning the crawling</a>
|
| 63 |
|
|
</li>
|
| 64 |
|
|
<li>
|
| 65 |
|
|
<a href="#4.3.3">4.3.3. Running the bot</a>
|
| 66 |
|
|
</li>
|
| 67 |
|
|
</ol>
|
| 68 |
|
|
</li>
|
| 69 |
|
|
<li>
|
| 70 |
|
|
<a href="#4.4">4.4. Generating the map</a>
|
| 71 |
|
|
<ol>
|
| 72 |
|
|
<li>
|
| 73 |
|
|
<a href="#4.4.1">4.4.1. Producing the <abbr title="Keyhole Markup Language">KML</abbr> file</a>
|
| 74 |
|
|
</li>
|
| 75 |
|
|
<li>
|
| 76 |
|
|
<a href="#4.4.2">4.4.2. Using the <abbr title="Keyhole Markup Language">KML</abbr> with Google Earth</a>
|
| 77 |
|
|
</li>
|
| 78 |
|
|
<li>
|
| 79 |
|
|
<a href="#4.4.3">4.4.3. Distributing the map</a>
|
| 80 |
|
|
</li>
|
| 81 |
|
|
</ol>
|
| 82 |
|
|
</li>
|
| 83 |
|
|
</ol>
|
| 84 |
|
|
</li>
|
| 85 |
|
|
<li>
|
| 86 |
|
|
<a href="#5">5. Licenses</a>
|
| 87 |
|
|
</li>
|
| 88 |
|
|
</ul>
|
| 89 |
|
|
</div>
|
| 90 |
|
|
<a href="#0">[ top ]</a>
|
| 91 |
|
|
<hr />
|
| 92 |
|
|
|
| 93 |
|
|
<a name="2" />
|
| 94 |
|
|
<h2>2. Overview</h2>
|
| 95 |
|
|
<p>
|
| 96 |
|
|
ForgottenIslanderBot (<abbr title="ForgottenIslanderBot">FIB</abbr>) was written in January 2010 after it
|
| 97 |
|
|
became apparent Bell Aliant was not expanding their <abbr title="Digital Subscriber Loop">DSL</abbr> coverage
|
| 98 |
|
|
across the whole of PEI, and would be offering their 3G modem as
|
| 99 |
|
|
an alternative. Due to numerous concerns about the 3G modems,
|
| 100 |
|
|
and vagueness surrounding the number of addresses <abbr title="Digital Subscriber Loop">DSL</abbr> would not
|
| 101 |
|
|
be available at, the author realized the need for a map of Bell
|
| 102 |
|
|
Aliant's <abbr title="Digital Subscriber Loop">DSL</abbr> coverage, and that such a map would be possible to make.
|
| 103 |
|
|
</p>
|
| 104 |
|
|
<p>
|
| 105 |
|
|
The bot uses the Check Availability tool on Bell Aliant's
|
| 106 |
|
|
website to find out whether addresses in the PEI Civic Address
|
| 107 |
|
|
Database are eligible for <abbr title="Digital Subscriber Loop">DSL</abbr>. This data is only as good as the
|
| 108 |
|
|
website's database, but may be better than any other map
|
| 109 |
|
|
available. To make a map, <abbr title="ForgottenIslanderBot">FIB</abbr> generates a <abbr title="Keyhole Markup Language">KML</abbr> overlay, which can
|
| 110 |
|
|
be displayed in virtual globe software, like Google Earth.
|
| 111 |
|
|
</p>
|
| 112 |
|
|
<p>
|
| 113 |
|
|
forgottenislanderbot is free, open-source software under the GNU
|
| 114 |
|
|
General Public License.
|
| 115 |
|
|
</p>
|
| 116 |
|
|
<a href="#0">[ top ]</a>
|
| 117 |
|
|
<hr />
|
| 118 |
|
|
|
| 119 |
|
|
<a name="3" />
|
| 120 |
|
|
<h2>3. Installation</h2>
|
| 121 |
|
|
<p>
|
| 122 |
|
|
<abbr title="ForgottenIslanderBot">FIB</abbr> is written in the
|
| 123 |
|
|
<a href="http://www.python.org/">Python programming language</a>.
|
| 124 |
|
|
To run <abbr title="ForgottenIslanderBot">FIB</abbr>,
|
| 125 |
|
|
you will need a computer with the Python software and a fast internet
|
| 126 |
|
|
connection which you can leave running overnight.
|
| 127 |
|
|
</p>
|
| 128 |
|
|
<p>
|
| 129 |
|
|
The Python <i>interpretor</i>, which is required to run <abbr title="ForgottenIslanderBot">FIB</abbr>, can
|
| 130 |
|
|
be downloaded for free from the internet. Python is available
|
| 131 |
|
|
for most recent operating systems. Forgottenislanderbot requires
|
| 132 |
|
|
Python version 2.5 or 2.6 -- it does not work with the newest
|
| 133 |
|
|
version, 3.1, yet. Python 2.6 can be downloaded from
|
| 134 |
|
|
<a href="http://www.python.org/download/releases/2.6.4/">
|
| 135 |
|
|
http://www.python.org/download/releases/2.6.4/</a>.
|
| 136 |
|
|
</p>
|
| 137 |
|
|
<p>
|
| 138 |
|
|
To display the map on a virtual globe, you will need virtual
|
| 139 |
|
|
globe software, such as <a href="http://earth.google.com/">Google Earth</a>
|
| 140 |
|
|
or <a href="http://worldwind.arc.nasa.gov/">NASA World Wind.</a>
|
| 141 |
|
|
Both of those require a recent computer with a mainstream
|
| 142 |
|
|
operating system, as they use 3D graphics and more processor
|
| 143 |
|
|
power and memory.<br />
|
| 144 |
|
|
Note that you do not need to use the <i>same</i> computer to run <abbr title="ForgottenIslanderBot">FIB</abbr> and
|
| 145 |
|
|
display the map.
|
| 146 |
|
|
</p>
|
| 147 |
|
|
<p>
|
| 148 |
|
|
<abbr title="ForgottenIslanderBot">FIB</abbr> can be installed in a folder on your computer, like
|
| 149 |
|
|
<nobr><b>C:\<abbr title="ForgottenIslanderBot">FIB</abbr>\</b></nobr> or <nobr><b>Documents/<abbr title="ForgottenIslanderBot">FIB</abbr>/</b></nobr>,
|
| 150 |
|
|
or it can be run from a folder on a USB drive. To install <abbr title="ForgottenIslanderBot">FIB</abbr>,
|
| 151 |
|
|
unzip the <abbr title="ForgottenIslanderBot">FIB</abbr> zip file into such a folder.
|
| 152 |
|
|
</p>
|
| 153 |
|
|
<a href="#0">[ top ]</a>
|
| 154 |
|
|
<hr />
|
| 155 |
|
|
|
| 156 |
|
|
<a name="4" />
|
| 157 |
|
|
<h2>4. Use</h2>
|
| 158 |
|
|
<a name="4.1" />
|
| 159 |
|
|
<h3>4.1. Using the command-line</h3>
|
| 160 |
|
|
<p>
|
| 161 |
|
|
<abbr title="ForgottenIslanderBot">FIB</abbr> is a <i>command-line program</i> -- it must be run and
|
| 162 |
|
|
controlled from the command line, also known as the command
|
| 163 |
|
|
prompt, terminal, MS-DOS prompt, xterm, virtual terminal, or
|
| 164 |
|
|
shell. Less-young users may recall MS-DOS, or
|
| 165 |
|
|
U<sub>NIX</sub>.<br />
|
| 166 |
|
|
Command-lines in modern operating systems differ; how to access
|
| 167 |
|
|
it, for a few modern OSs, is described below:
|
| 168 |
|
|
<dl>
|
| 169 |
|
|
<dt>Microsoft Windows 95/98<sub>, maybe also ME</sub></dt>
|
| 170 |
|
|
<dd>Start > Programs > Accessories > MS-DOS
|
| 171 |
|
|
Prompt</dd>
|
| 172 |
|
|
<dt>Microsoft Windows XP<sub>, maybe also NT/2000, maybe
|
| 173 |
|
|
Vista</sub><
|
| 174 |
|
|
<dd>Start > All Programs > Accessories >
|
| 175 |
|
|
Command Prompt</dd>
|
| 176 |
|
|
<dt>Apple Mac OS X</dt>
|
| 177 |
|
|
<dd>Finder > Applications > Utilities >
|
| 178 |
|
|
Terminal</dd>
|
| 179 |
|
|
<dt>Linux</dt>
|
| 180 |
|
|
<dd><i>Due to variation, I'll assume most Linux users know
|
| 181 |
|
|
where to find a terminal</i></dd>
|
| 182 |
|
|
</dl>
|
| 183 |
|
|
</p>
|
| 184 |
|
|
<p>
|
| 185 |
|
|
Because <abbr title="ForgottenIslanderBot">FIB</abbr> must be run by Python, on most OSs you'll need to
|
| 186 |
|
|
manually invoke it.
|
| 187 |
|
|
<ol>
|
| 188 |
|
|
<li>Go to your command-line;</li>
|
| 189 |
|
|
<li>Navigate to the folder <abbr title="ForgottenIslanderBot">FIB</abbr> is installed to.<br />
|
| 190 |
|
|
<b>cd \<abbr title="ForgottenIslanderBot">FIB</abbr>\</b> or <b>cd Documents/<abbr title="ForgottenIslanderBot">FIB</abbr>/</b> might work,
|
| 191 |
|
|
depending on where you installed it.</li>
|
| 192 |
|
|
<li>You need to know how to run Python. If it is
|
| 193 |
|
|
installed completely, and in your PATH, running
|
| 194 |
|
|
<div class="cmd">python --version</div> should not
|
| 195 |
|
|
produce an error message. If not, you can either add
|
| 196 |
|
|
Python to your PATH, or run Python with its full path,
|
| 197 |
|
|
for example <div class="cmd">"\Python25\python.exe"
|
| 198 |
|
|
--version</div></li>
|
| 199 |
|
|
<li>Once you can run Python, you can run <abbr title="ForgottenIslanderBot">FIB</abbr> with
|
| 200 |
|
|
commands like the following:
|
| 201 |
|
|
<div class="cmd"><nobr>python fib-dbutil.py database.sqlite --stats yyyy 1 n</nobr></div>
|
| 202 |
|
|
Instructions in this document are presented in this format;
|
| 203 |
|
|
however, if you have to invoke Python differently, you
|
| 204 |
|
|
will have to change the command as needed.
|
| 205 |
|
|
</li>
|
| 206 |
|
|
</ol>
|
| 207 |
|
|
</p>
|
| 208 |
|
|
<a name="4.2" />
|
| 209 |
|
|
<h3>4.2. Getting the civic-address database</h3>
|
| 210 |
|
|
<p>
|
| 211 |
|
|
The bot needs a list of civic addresses to look up on the website.
|
| 212 |
|
|
The PEI government offers the civic address database online,
|
| 213 |
|
|
with no apparent constraints on use. It also has
|
| 214 |
|
|
latitude/longitude coordinates for each address, which
|
| 215 |
|
|
greatly assist making the map.<br />
|
| 216 |
|
|
Bell Aliant's website seems to be using the same list,
|
| 217 |
|
|
as street addresses appear in exactly the same format as
|
| 218 |
|
|
is used by the provincial database. Also, Google Maps is
|
| 219 |
|
|
probably also using the provincial database, as the
|
| 220 |
|
|
coordinates it finds for a civic address are the same.
|
| 221 |
|
|
</p>
|
| 222 |
|
|
<a name="4.2.1" />
|
| 223 |
|
|
<h4>4.2.1. Downloading the raw data</h4>
|
| 224 |
|
|
<p>
|
| 225 |
|
|
The civic address database has to be downloaded and
|
| 226 |
|
|
converted to a sqlite database before the bot can use
|
| 227 |
|
|
it. To get the database, go to <a href="http://www.gov.pe.ca/civicaddress/download/index.php3">http://www.gov.pe.ca/civicaddress/download/index.php3</a>
|
| 228 |
|
|
and download whichever counties you need (which in most
|
| 229 |
|
|
cases will be all three):
|
| 230 |
|
|
<ol>
|
| 231 |
|
|
<li>Select a county in the drop-down box.</li>
|
| 232 |
|
|
<li>Click <i>Download all addresses in this
|
| 233 |
|
|
county</i>.</li>
|
| 234 |
|
|
<li>It will load a new page.</li>
|
| 235 |
|
|
<li>Make sure <i>Tab-delimited ASCII</i> is
|
| 236 |
|
|
selected as the Download Format.</li>
|
| 237 |
|
|
<li>Make sure <i>Street Number</i>, <i>Street
|
| 238 |
|
|
Name</i>, <i>Community Name</i>, <i>Apartment
|
| 239 |
|
|
Number</i>, <i>County</i>, <i>Latitude</i>,
|
| 240 |
|
|
<i>Longitude</i> are selected. <i>Police
|
| 241 |
|
|
Department</i>, <i>Fire Department</i>, and <i>Ambulance</i>
|
| 242 |
|
|
are unnecessary, but may be selected anyway.</li>
|
| 243 |
|
|
<li>Click <i>Download the Data</i>. You will probably be asked to
|
| 244 |
|
|
save the file; save it to the folder <abbr title="ForgottenIslanderBot">FIB</abbr> is
|
| 245 |
|
|
installed in as <b>[county].tsv</b>.</li>
|
| 246 |
|
|
<li>Repeat for additional counties.</li>
|
| 247 |
|
|
</ol>
|
| 248 |
|
|
</p>
|
| 249 |
|
|
<a name="4.2.2" />
|
| 250 |
|
|
<h4>4.2.2. Converting to a <abbr title="ForgottenIslanderBot">FIB</abbr> database</h4>
|
| 251 |
|
|
<p>
|
| 252 |
|
|
As downloaded, the civic address databases are in
|
| 253 |
|
|
<i>tab-separated-values</i> (TSV) format, where each field, e.g. civic
|
| 254 |
|
|
number, is separated from the next, e.g. street name, by a tab
|
| 255 |
|
|
character, with one address per line. <abbr title="ForgottenIslanderBot">FIB</abbr> works with
|
| 256 |
|
|
<i>sqlite</i> databases, which are files containing data in a
|
| 257 |
|
|
non-human-readable format, but which allow for easier searching,
|
| 258 |
|
|
organizing, and updating of the data.<br />
|
| 259 |
|
|
<abbr title="ForgottenIslanderBot">FIB</abbr> has a command to combine the three TSV files into one sqlite
|
| 260 |
|
|
file. Assuming the TSV files are <b>queens.tsv</b>,
|
| 261 |
|
|
<b>kings.tsv</b>, and <b>prince.tsv</b>, and they are in the
|
| 262 |
|
|
same folder as <abbr title="ForgottenIslanderBot">FIB</abbr>, run:
|
| 263 |
|
|
<nobr><div class="cmd">python fib-cadb2sql.py database.sqlite
|
| 264 |
|
|
queens.tsv kings.tsv prince.tsv</div>
|
| 265 |
|
|
</nobr>
|
| 266 |
|
|
It should output a file called <b>database.sqlite</b>. When the
|
| 267 |
|
|
bot is run, it will read civic addresses from this file, and
|
| 268 |
|
|
write to it whether <abbr title="Digital Subscriber Loop">DSL</abbr> is available at each address.
|
| 269 |
|
|
</p>
|
| 270 |
|
|
<a name="4.3" />
|
| 271 |
|
|
<h3>4.3. Crawling for <abbr title="Digital Subscriber Loop">DSL</abbr> coverage</h3>
|
| 272 |
|
|
<p>
|
| 273 |
|
|
The sqlite civic address database has an empty column for <abbr title="Digital Subscriber Loop">DSL</abbr>
|
| 274 |
|
|
availability. To fill it in is where the real <b>bot</b> part of
|
| 275 |
|
|
forgottenislanderbot comes in.
|
| 276 |
|
|
</p>
|
| 277 |
|
|
<a name="4.3.1" />
|
| 278 |
|
|
<h4>4.3.1. Robot etiquette</h4>
|
| 279 |
|
|
<p>
|
| 280 |
|
|
<abbr title="ForgottenIslanderBot">FIB</abbr> is in a class of computer programs known as <i>bots</i>,
|
| 281 |
|
|
<i>robots</i>, <i>web robots</i>, <i>webbots</i>, or <i>crawlers</i>.
|
| 282 |
|
|
These programs load web pages by themselves, without human
|
| 283 |
|
|
help -- they often load more web pages more quickly than
|
| 284 |
|
|
a human could, and this is usually what they're written for. A
|
| 285 |
|
|
subset of bots known as <i>spiders</i> follow web links from
|
| 286 |
|
|
site to site; <abbr title="ForgottenIslanderBot">FIB</abbr> is not one of these.
|
| 287 |
|
|
</p>
|
| 288 |
|
|
<p>
|
| 289 |
|
|
Because of robots' abilities to request a vast number of web
|
| 290 |
|
|
pages, they are sometimes viewed as a nuisance or a threat. Some
|
| 291 |
|
|
webmasters (if they are on top of things) might not want bots to
|
| 292 |
|
|
view certain pages on their website (like a <abbr title="Digital Subscriber Loop">DSL</abbr> availability
|
| 293 |
|
|
page), or they may not want bots at all. To let co-operative
|
| 294 |
|
|
bots know of such policies, a standard exists around a
|
| 295 |
|
|
<i>robots.txt</i> file, which can be put in the highest level of
|
| 296 |
|
|
a web server. Compliant bots will check for this file, and
|
| 297 |
|
|
interpret it to determine what they may view.
|
| 298 |
|
|
</p>
|
| 299 |
|
|
<p>
|
| 300 |
|
|
Although <abbr title="ForgottenIslanderBot">FIB</abbr> checks for a robots.txt, and to an extent will try
|
| 301 |
|
|
to check what it is permitted, if it is forbidden to access its
|
| 302 |
|
|
target, it will ask the operator for permission to override it,
|
| 303 |
|
|
and fetch the data anyway. Due to this override, <abbr title="ForgottenIslanderBot">FIB</abbr> was written
|
| 304 |
|
|
to view robots.txt as an inconvenient formality, and offload the
|
| 305 |
|
|
ethical burden of overriding to the operator from the
|
| 306 |
|
|
programmer.<br />
|
| 307 |
|
|
When I last checked, Bell Aliant did not have a robots.txt file.
|
| 308 |
|
|
If <abbr title="ForgottenIslanderBot">FIB</abbr> has much of an effect, they just might put one in. ;-)<br />
|
| 309 |
|
|
Furthermore, there was nothing in their website Terms of Service
|
| 310 |
|
|
forbidding bots specifically -- just DoS attacks.
|
| 311 |
|
|
</p>
|
| 312 |
|
|
<p>
|
| 313 |
|
|
A <i>Denial-of-Service (DoS)</i> attack is a hostile action where
|
| 314 |
|
|
a webserver is bombarded with numerous requests, from malicious
|
| 315 |
|
|
bots. At some point it will become overloaded and be unable to
|
| 316 |
|
|
serve any more pages -- hence the denial-of-service. Robot
|
| 317 |
|
|
operators and programmers must be careful to avoid appearing as
|
| 318 |
|
|
a DoS attack.<br />
|
| 319 |
|
|
Many web servers have monitoring software, which can sound
|
| 320 |
|
|
alarms if unusually large amounts of page requests are received.
|
| 321 |
|
|
Law-abiding bot operators should avoid generating alarming
|
| 322 |
|
|
amounts of internet traffic.
|
| 323 |
|
|
<p>
|
| 324 |
|
|
<abbr title="ForgottenIslanderBot">FIB</abbr> is <i>single-threaded</i> -- it requests web pages one at a
|
| 325 |
|
|
time -- so it is unlikely to appear as a DoS threat. (Most DoS
|
| 326 |
|
|
attackers use between twenty and several thousand requests per
|
| 327 |
|
|
second.) Furthermore, <abbr title="ForgottenIslanderBot">FIB</abbr> is easily configured to appear as less
|
| 328 |
|
|
of a threat:<dl>
|
| 329 |
|
|
<dt><b>interval</b>, or <b>delay</b>
|
| 330 |
|
|
</dt>
|
| 331 |
|
|
<dd>the time the bot pauses between loading each
|
| 332 |
|
|
address' <abbr title="Digital Subscriber Loop">DSL</abbr> status. Given to the bot in seconds, but
|
| 333 |
|
|
can be expressed as a decimal. The longer the interval,
|
| 334 |
|
|
the less threatening it appears to any monitoring
|
| 335 |
|
|
software.</dd>
|
| 336 |
|
|
<dt>
|
| 337 |
|
|
<b>sparsity</b>
|
| 338 |
|
|
</dt>
|
| 339 |
|
|
<dd>how many addresses to skip, and not check.
|
| 340 |
|
|
Technically, the sparsity is the minimum difference
|
| 341 |
|
|
between civic numbers of checked addresses, evaluated on
|
| 342 |
|
|
a per-road basis. To get every house, set to 1. To get
|
| 343 |
|
|
at least one house per road, and about every thousandth
|
| 344 |
|
|
house on the same road, set to 1000. For example, with a
|
| 345 |
|
|
sparsity of 10, it would check #500, if it was
|
| 346 |
|
|
the first house on the road, not #509, but also
|
| 347 |
|
|
#510, also #560, and, if the next was
|
| 348 |
|
|
#610, that too.</dd>
|
| 349 |
|
|
</dl>
|
| 350 |
|
|
In practice, a one-second delay is probably safe. The PEI civic
|
| 351 |
|
|
address database contains, at last check <nobr>68 023</nobr>
|
| 352 |
|
|
addresses (homes and businesses). At a low sparsity, say 40, the
|
| 353 |
|
|
bot will get about <nobr>10 000</nobr> addresses -- more than
|
| 354 |
|
|
enough for a detailed map. However, if you check all <nobr>68
|
| 355 |
|
|
023</nobr> addresses (with a sparsity of 1), not only will you
|
| 356 |
|
|
be able to produce maps with any density (see <a href="#4.4.1">
|
| 357 |
|
|
section 4.4.1</a> for details on this), but you will have a
|
| 358 |
|
|
precise number of how many addresses Bell Aliant's website says
|
| 359 |
|
|
can't get <abbr title="Digital Subscriber Loop">DSL</abbr>! With low sparsity, you'd have to estimate, which
|
| 360 |
|
|
isn't quite as accurate. The highest sparsity possible -- one
|
| 361 |
|
|
per road (which can be entered as 2781, or higher) -- without
|
| 362 |
|
|
excluding cities -- is still 5108 addresses.
|
| 363 |
|
|
As they are the first on a road, not the last, that might give a
|
| 364 |
|
|
lower count of no-<abbr title="Digital Subscriber Loop">DSL</abbr> addresses than actually exist.
|
| 365 |
|
|
</p>
|
| 366 |
|
|
<a name="4.3.2" />
|
| 367 |
|
|
<h4>4.3.2. Planning the crawling</h4>
|
| 368 |
|
|
<p>
|
| 369 |
|
|
Before running the bot, you should decide on the sparsity and
|
| 370 |
|
|
the delay. In choosing these, you should be aware how long it
|
| 371 |
|
|
will take to run. To find out, you must know a) how many
|
| 372 |
|
|
addresses your given sparsity will check, and b) how long of a
|
| 373 |
|
|
delay to allow, plus how long it takes to check an address.
|
| 374 |
|
|
</p>
|
| 375 |
|
|
<p>
|
| 376 |
|
|
One of the forgottenislanderbot tools is lets you see how many
|
| 377 |
|
|
addresses will be produced by a given sparsity. Assuming your
|
| 378 |
|
|
civic-address sqlite database is <b>database.sqlite</b>, and it
|
| 379 |
|
|
is in the same folder as <abbr title="ForgottenIslanderBot">FIB</abbr>, run:
|
| 380 |
|
|
<div class="cmd"><nobr>python fib-dbutil.py database.sqlite
|
| 381 |
|
|
--stats yyyy <i>sparsity</i>
|
| 382 |
|
|
<i>n</i>
|
| 383 |
|
|
</nobr>
|
| 384 |
|
|
</div>
|
| 385 |
|
|
<br />
|
| 386 |
|
|
where
|
| 387 |
|
|
<dl>
|
| 388 |
|
|
<dt>sparsity</dt>
|
| 389 |
|
|
<dd>is the sparsity, for example 40, and</dd>
|
| 390 |
|
|
<dt>n</dt>
|
| 391 |
|
|
<dd>is <b>n</b>, as in no, unless you wish to
|
| 392 |
|
|
specifically exclude Charlottetown and Summerside (on
|
| 393 |
|
|
the assumption high-speed is readily available), in
|
| 394 |
|
|
which case you use <b>y</b>.</dd>
|
| 395 |
|
|
</dl>
|
| 396 |
|
|
That command will produce output like the following:
|
| 397 |
|
|
<pre> ERROR : 0
|
| 398 |
|
|
EMPTY : 5109
|
| 399 |
|
|
NODSL : 0
|
| 400 |
|
|
BASIC : 0
|
| 401 |
|
|
ULTRA : 0
|
| 402 |
|
|
TOTAL : 5109</pre>
|
| 403 |
|
|
The second-to-top value, EMPTY, is the one to observe -- with that
|
| 404 |
|
|
sparsity (2780), 5109 addresses would be checked. As this example
|
| 405 |
|
|
shows an empty database, TOTAL can also be used as the guideline.
|
| 406 |
|
|
</p>
|
| 407 |
|
|
<p>
|
| 408 |
|
|
To estimate the time the bot will take to run, multiply the
|
| 409 |
|
|
number of addresses to be checked by the
|
| 410 |
|
|
time to check each address, which is the sum of the actual time
|
| 411 |
|
|
to load the page and the bot's delay. On a dial-up connection, the
|
| 412 |
|
|
page might take 12 seconds to load, on broadband, it might take
|
| 413 |
|
|
only <sup>1</sup>/<sub>20</sub>
|
| 414 |
|
|
<sup>th</sup> of a second,
|
| 415 |
|
|
although in practice it could easily take one second. For
|
| 416 |
|
|
example:<br />
|
| 417 |
|
|
<nobr>
|
| 418 |
|
|
<b>68 000 × ( <sup>1</sup>/<sub>2</sub> + 1 ) = 102 000
|
| 419 |
|
|
seconds</b>
|
| 420 |
|
|
</nobr>
|
| 421 |
|
|
<br />
|
| 422 |
|
|
Divide seconds by 3600 to find hours -- in this example,
|
| 423 |
|
|
28<sup>1</sup>/<sub>3</sub>.
|
| 424 |
|
|
</p>
|
| 425 |
|
|
<p>
|
| 426 |
|
|
<abbr title="ForgottenIslanderBot">FIB</abbr> stores <abbr title="Digital Subscriber Loop">DSL</abbr>-status values in its sqlite database, and will
|
| 427 |
|
|
not recheck an address if its status is already known. Because
|
| 428 |
|
|
of this, you can run it incompletely, getting more of the database
|
| 429 |
|
|
checked each time. Furthermore, you can run it at decreasing
|
| 430 |
|
|
sparsity -- for example, first with a sparsity of 1000, then 40,
|
| 431 |
|
|
then 10, then 1. The easiest way to stop <abbr title="ForgottenIslanderBot">FIB</abbr>, as described in <a href="#4.1">section 4.1</a>, is to press <b>Control-C</b> at the
|
| 432 |
|
|
command-line it's running in.
|
| 433 |
|
|
</p>
|
| 434 |
|
|
<a name="4.3.3" />
|
| 435 |
|
|
<h4>4.3.3. Running the bot</h4>
|
| 436 |
|
|
<p>
|
| 437 |
|
|
You can continue to use the computer <abbr title="ForgottenIslanderBot">FIB</abbr> runs on -- it doesn't
|
| 438 |
|
|
use very much memory or processor power, and, with its delay,
|
| 439 |
|
|
not much network bandwidth. If the computer crashes, a few
|
| 440 |
|
|
addresses will have been forgotten, but once the bot is started
|
| 441 |
|
|
again it will recheck them. Keep in mind, though, that the
|
| 442 |
|
|
computer <abbr title="ForgottenIslanderBot">FIB</abbr> is running on will need to be left on, and logged
|
| 443 |
|
|
in.
|
| 444 |
|
|
</p>
|
| 445 |
|
|
<p>
|
| 446 |
|
|
Once you have decided on a sparsity and delay, you can invoke
|
| 447 |
|
|
the bot. Assuming your sqlite database is <b>database.sqlite</b>
|
| 448 |
|
|
and it's in the same folder as <abbr title="ForgottenIslanderBot">FIB</abbr>, run:
|
| 449 |
|
|
<div class="cmd"><nobr>python fib-crawlbot.py database.sqlite 1 1 n</nobr>
|
| 450 |
|
|
</div>
|
| 451 |
|
|
where, respectively,
|
| 452 |
|
|
<dl>
|
| 453 |
|
|
<dt>1</dt>
|
| 454 |
|
|
<dd>is the sparsity (every house),</dd>
|
| 455 |
|
|
<dt>1</dt>
|
| 456 |
|
|
<dd>is the delay (one second), and</dd>
|
| 457 |
|
|
<dt>n</dt>
|
| 458 |
|
|
<dd>as in no, means not to skip the cities.</dd>
|
| 459 |
|
|
</dl>
|
| 460 |
|
|
After a few seconds, in which there is a small chance of it
|
| 461 |
|
|
asking you to override a robots.txt, it will begin testing
|
| 462 |
|
|
addresses. It will display the full address of each it checks,
|
| 463 |
|
|
with a rough progress bar of dots, until it displays
|
| 464 |
|
|
<b>1</b>,<b>2</b>, or <b>3</b>, which correspond to no <abbr title="Digital Subscriber Loop">DSL</abbr>, 1.5
|
| 465 |
|
|
Mbps <abbr title="Digital Subscriber Loop">DSL</abbr>, and 7 Mbps <abbr title="Digital Subscriber Loop">DSL</abbr>, respectively. It will then sleep for
|
| 466 |
|
|
the specified delay, and go on to the next address.<br />
|
| 467 |
|
|
You can (and, unless you're highly bored, <i>should</i>) leave
|
| 468 |
|
|
it until it finishes, at which point it will write <b>done</b>
|
| 469 |
|
|
and return the the command-line. If you tell it to get the full
|
| 470 |
|
|
database, come back later, and find it done, you could run the
|
| 471 |
|
|
same command again -- it will only re-get addresses it failed to
|
| 472 |
|
|
fetch the first time (it should display error messages when that
|
| 473 |
|
|
happens).
|
| 474 |
|
|
</p>
|
| 475 |
|
|
<p>
|
| 476 |
|
|
Once the bot has fetched the full database, you can confirm it
|
| 477 |
|
|
checked every address, and see the raw-number totals, with this
|
| 478 |
|
|
command (assuming your <b>database.sqlite</b> is in the same
|
| 479 |
|
|
folder as <abbr title="ForgottenIslanderBot">FIB</abbr>):
|
| 480 |
|
|
<div class="cmd"><nobr>python fib-dbutil.py database.sqlite
|
| 481 |
|
|
--stats yyyy 1 n</nobr>
|
| 482 |
|
|
</div>
|
| 483 |
|
|
That will display the complete totals. For an example of the
|
| 484 |
|
|
output table, see <a href="#4.3.2">section 4.3.2</a>. If every
|
| 485 |
|
|
address was successfully checked, ERROR and EMPTY will both be
|
| 486 |
|
|
0. You can see the results in the NO<abbr title="Digital Subscriber Loop">DSL</abbr>, BASIC, and ULTRA rows.
|
| 487 |
|
|
</p>
|
| 488 |
|
|
<a name="4.4" />
|
| 489 |
|
|
<h3>4.4. Generating the map</h3>
|
| 490 |
|
|
<p>
|
| 491 |
|
|
The <b>--stats</b> command provides the totals of each status,
|
| 492 |
|
|
but forgottenislanderbot was written with the intention of
|
| 493 |
|
|
making a map. The map is to be displayed in virtual globe
|
| 494 |
|
|
software, such as <span class="attn">
|
| 495 |
|
|
<a href="http://earth.google.com/">Google Earth</a> or
|
| 496 |
|
|
<a href="http://worldwind.arc.nasa.gov/">NASA World Wind.</a></span>
|
| 497 |
|
|
At this moment, it has only been tested with Google Earth, and
|
| 498 |
|
|
by default uses map icons provided by Google Earth. Although
|
| 499 |
|
|
Google Earth is better-known and has more detailed imagery, it
|
| 500 |
|
|
is copyrighted, with restrictions on use; I think NASA World
|
| 501 |
|
|
Wind's imagery are public-domain, with no constraints on
|
| 502 |
|
|
modification or redistribution.
|
| 503 |
|
|
</p>
|
| 504 |
|
|
<p>
|
| 505 |
|
|
To display <abbr title="Digital Subscriber Loop">DSL</abbr> coverage on a virtual globe, forgottenislanderbot
|
| 506 |
|
|
generates a <abbr title="Keyhole Markup Language">KML</abbr>
|
| 507 |
|
|
file, which contains numerous lat/long coordinates matched with
|
| 508 |
|
|
a coloured icon. Virtual globe software will display the
|
| 509 |
|
|
designated icon on the map at the specified lat/long
|
| 510 |
|
|
coordinates, producing a detailed overlay of <abbr title="Digital Subscriber Loop">DSL</abbr> coverage on
|
| 511 |
|
|
PEI. The <abbr title="Keyhole Markup Language">KML</abbr> file is mostly useless unless it is displayed on a
|
| 512 |
|
|
virtual globe, but because it does not contain Google's imagery,
|
| 513 |
|
|
it can be redistributed widely. The <abbr title="Keyhole Markup Language">KML</abbr> file is usually less
|
| 514 |
|
|
than 4MB in size. However, they can be compressed to under
|
| 515 |
|
|
200kB, in a <abbr title="Keyhole Markup language, Zipped">KMZ</abbr> file,
|
| 516 |
|
|
although <abbr title="ForgottenIslanderBot">FIB</abbr> doesn't support this yet.<br />
|
| 517 |
|
|
For privacy reasons, <abbr title="ForgottenIslanderBot">FIB</abbr> does not retain civic addresses in the
|
| 518 |
|
|
<abbr title="Keyhole Markup Language">KML</abbr> files it produces; this could be enabled, with an increase in
|
| 519 |
|
|
file size.
|
| 520 |
|
|
</p>
|
| 521 |
|
|
<a name="4.4.1" />
|
| 522 |
|
|
<h4>4.4.1. Producing the <abbr title="Keyhole Markup Language">KML</abbr> file</h4>
|
| 523 |
|
|
<p>
|
| 524 |
|
|
All <nobr>68 000</nobr> points are too much to expect a virtual
|
| 525 |
|
|
globe to display, due to memory and processing requirements. To
|
| 526 |
|
|
make lower-resolution maps, <abbr title="ForgottenIslanderBot">FIB</abbr> uses the same sparsity option
|
| 527 |
|
|
employed for crawling. The --stats command (see <a href="#4.3.2">section 4.3.2</a>) can be used to determine how
|
| 528 |
|
|
many points would be in the map. A sparsity of 40 produces a
|
| 529 |
|
|
rather cumbersome map.
|
| 530 |
|
|
</p>
|
| 531 |
|
|
<p>
|
| 532 |
|
|
Assuming your <b>database.sqlite</b> is in the same folder as
|
| 533 |
|
|
<abbr title="ForgottenIslanderBot">FIB</abbr>, and you wish to produce <b>map.<abbr title="Keyhole Markup Language">KML</abbr></b>, run:
|
| 534 |
|
|
<div class="cmd"><nobr>python fib-dbutil.py database.sqlite
|
| 535 |
|
|
--<abbr title="Keyhole Markup Language">KML</abbr> nyyy 40 n</nobr>
|
| 536 |
|
|
</div>
|
| 537 |
|
|
where
|
| 538 |
|
|
<dl>
|
| 539 |
|
|
<dt>nyyy</dt>
|
| 540 |
|
|
<dd>is the <i>status mask</i>--it sets which <abbr title="Digital Subscriber Loop">DSL</abbr>
|
| 541 |
|
|
statuses to display in the map. It comprises four
|
| 542 |
|
|
<b>y</b>/<b>n</b> yes/no values, each one defining
|
| 543 |
|
|
whether to display a particular <abbr title="Digital Subscriber Loop">DSL</abbr> status or not.
|
| 544 |
|
|
Respectively, the four statuses are unchecked, no <abbr title="Digital Subscriber Loop">DSL</abbr>,
|
| 545 |
|
|
1.5 Mbps <abbr title="Digital Subscriber Loop">DSL</abbr>, and 7 Mbps <abbr title="Digital Subscriber Loop">DSL</abbr>. In the example, unchecked
|
| 546 |
|
|
addresses will not be displayed. To display only addresses
|
| 547 |
|
|
without <abbr title="Digital Subscriber Loop">DSL</abbr> availability, use <b>nynn</b>. Similar terms
|
| 548 |
|
|
influence the --stats command; </dd>
|
| 549 |
|
|
<dt>40</dt>
|
| 550 |
|
|
<dd>is the sparsity; to get every address, use 1,
|
| 551 |
|
|
although that is inadvisable unless you're only
|
| 552 |
|
|
displaying no-<abbr title="Digital Subscriber Loop">DSL</abbr> addresses, with 'nynn'; and</dd>
|
| 553 |
|
|
<dt>n</dt>
|
| 554 |
|
|
<dd>is no, meaning not to skip the cities.</dd>
|
| 555 |
|
|
</dl>
|
| 556 |
|
|
</p>
|
| 557 |
|
|
<a name="4.4.2" />
|
| 558 |
|
|
<h4>4.4.2. Using the <abbr title="Keyhole Markup Language">KML</abbr> with Google Earth</h4>
|
| 559 |
|
|
<p>
|
| 560 |
|
|
Once you have generated your <abbr title="Keyhole Markup Language">KML</abbr> file, you have to load it onto
|
| 561 |
|
|
your virtual globe. In some operating systems, you can
|
| 562 |
|
|
double-click the <abbr title="Keyhole Markup Language">KML</abbr> file, or run it like a program from your
|
| 563 |
|
|
command line:
|
| 564 |
|
|
<div class="cmd"><nobr>map.<abbr title="Keyhole Markup Language">KML</abbr></nobr>
|
| 565 |
|
|
</div>
|
| 566 |
|
|
However, most likely you will load it from your virtual globe
|
| 567 |
|
|
program itself. In Google Earth, click the <i>File>Open...</i> menu
|
| 568 |
|
|
item, select the <abbr title="Keyhole Markup Language">KML</abbr> file (which will probably be in the
|
| 569 |
|
|
same folder as <abbr title="ForgottenIslanderBot">FIB</abbr>), and click <i>Open</i>. If you don't want to
|
| 570 |
|
|
do this each time you launch Google Earth, you can drag the
|
| 571 |
|
|
<i>forgottenislanderbot</i> item from the Temporary Places
|
| 572 |
|
|
branch of the sidebar to inside the My Places branch.<br />
|
| 573 |
|
|
The procedure for NASA World Wind is probably not much
|
| 574 |
|
|
different.
|
| 575 |
|
|
</p>
|
| 576 |
17 |
art |
<p>
|
| 577 |
|
|
At present, unchecked addresses appear white,
|
| 578 |
|
|
no-<abbr title="Digital Subscriber Loop">DSL</abbr> addresses
|
| 579 |
|
|
appear yellow, 1.5 Mbps <abbr title="Digital Subscriber Loop">DSL</abbr>
|
| 580 |
|
|
addresses appear cyan, and 7 Mbps <abbr title="Digital Subscriber Loop">DSL</abbr>
|
| 581 |
|
|
addresses appear green.
|
| 582 |
|
|
</p>
|
| 583 |
15 |
art |
<a name="4.4.3" />
|
| 584 |
|
|
<h4>4.4.3. Distributing the map</h4>
|
| 585 |
|
|
<p>
|
| 586 |
|
|
The <abbr title="Keyhole Markup Language">KML</abbr> file can be distributed through email,
|
| 587 |
|
|
websites/blogs, or individual ways of transfering
|
| 588 |
|
|
digital data, like CDs or USB drives. They can be
|
| 589 |
|
|
compressed in zip archive files, so long as it is
|
| 590 |
|
|
unzipped before use. Zipping does not help a <abbr title="Keyhole Markup language, Zipped">KMZ</abbr> file, though.<br />
|
| 591 |
|
|
Images from virtual globes can be saved, although you
|
| 592 |
|
|
may want to check the imagery licenses before
|
| 593 |
|
|
distributing them.
|
| 594 |
|
|
</p>
|
| 595 |
|
|
<a href="#0">[ top ]</a>
|
| 596 |
|
|
<hr />
|
| 597 |
|
|
<a name="5" />
|
| 598 |
|
|
<h2>5. Licenses</h2>
|
| 599 |
|
|
<h3>forgottenislanderbot</h3>
|
| 600 |
|
|
<p><pre>
|
| 601 |
|
|
forgottenislanderbot - makes a <abbr title="Digital Subscriber Loop">DSL</abbr> coverage map from Bell Aliant's website.
|
| 602 |
|
|
Copyright (c) 2010 Art Ortenburger
|
| 603 |
|
|
|
| 604 |
|
|
This program is free software; you can redistribute it and/or
|
| 605 |
|
|
modify it under the terms of the <a href="http://www.gnu.org/licenses/gpl.txt">GNU General Public License</a>
|
| 606 |
|
|
as published by the Free Software Foundation; either version 2
|
| 607 |
|
|
of the License, or any later version.
|
| 608 |
|
|
|
| 609 |
|
|
This program is distributed in the hope that it will be useful,
|
| 610 |
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
| 611 |
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
| 612 |
|
|
GNU General Public License for more details.
|
| 613 |
|
|
|
| 614 |
|
|
You should have received a copy of the GNU General Public License
|
| 615 |
|
|
along with this program; if not, write to the Free Software
|
| 616 |
|
|
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
|
| 617 |
|
|
|
| 618 |
|
|
</pre></p>
|
| 619 |
|
|
<h3>fib.html -- this documentation</h3>
|
| 620 |
|
|
<p>
|
| 621 |
|
|
This document is licensed under the Creative Commons
|
| 622 |
|
|
Attribution-ShareAlike Canada license. See footer for details.
|
| 623 |
|
|
</p>
|
| 624 |
|
|
<h3>where to find licenses for data used by <abbr title="ForgottenIslanderBot">FIB</abbr></h3>
|
| 625 |
|
|
<p>
|
| 626 |
|
|
PEI government <a href="http://www.gov.pe.ca/civicaddress/">civic address databases</a> are not
|
| 627 |
|
|
specifically licensed on the government website, but
|
| 628 |
|
|
<ol>
|
| 629 |
|
|
<li>the <a href="http://www.gov.pe.ca/index.php3?number=1024403&lang=E">website copyright page</a> states that
|
| 630 |
|
|
information on the site may be reproduced
|
| 631 |
|
|
without further permission for non-commerical
|
| 632 |
|
|
use, and</li>
|
| 633 |
|
|
<li>the <a href="http://www.gov.pe.ca/civicaddress/">civic address main page</a> lists the civic
|
| 634 |
|
|
address database with the subheading "USE IN
|
| 635 |
|
|
YOUR OWN APPLICATIONS".</li>
|
| 636 |
|
|
</ol>
|
| 637 |
|
|
</p>
|
| 638 |
|
|
<p>
|
| 639 |
|
|
Users of Google Earth must comply with Google's license
|
| 640 |
|
|
agreements for Google Earth, which can be found on Google's
|
| 641 |
|
|
website. Google Earth imagery may be under a separate license.
|
| 642 |
|
|
</p>
|
| 643 |
|
|
<p>
|
| 644 |
|
|
NASA World Wind must be used in accordance with NASA
|
| 645 |
|
|
Open Source Agreement, which can be found on the NASA
|
| 646 |
|
|
World Wind website. NASA World Wind provides multiple
|
| 647 |
|
|
imagery layers. Some of these are in the public domain;
|
| 648 |
|
|
those which aren't are subject to a license agreement.
|
| 649 |
|
|
</p>
|
| 650 |
|
|
<p>
|
| 651 |
|
|
The Bell Aliant webpages are not retained by <abbr title="ForgottenIslanderBot">FIB</abbr>, and
|
| 652 |
|
|
the data stored is derived from the presence of various
|
| 653 |
|
|
phrases in the webpage, rather than an actual excerpt
|
| 654 |
|
|
from the website. Use of the Bell Aliant website is
|
| 655 |
|
|
governed by multiple agreements and terms of service,
|
| 656 |
|
|
which they administer on their website.
|
| 657 |
|
|
</p>
|
| 658 |
|
|
<a href="#0">[ top ]</a>
|
| 659 |
|
|
<hr />
|
| 660 |
|
|
<div class="footer">Copyright © 2010 Art Ortenburger.<br />
|
| 661 |
|
|
<a rel="license" href="http://creativecommons.org/licenses/by-sa/2.5/ca/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-sa/2.5/ca/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/2.5/ca/">Creative Commons Licence</a>.</div>
|
| 662 |
|
|
</body>
|
| 663 |
|
|
</html>
|