Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
5122 commits
Select commit Hold shift + click to select a range
c0595c7
crawls
ToSBackCrawler Jul 7, 2020
2877a08
changes for reviewed docs
ToSBackCrawler Jul 8, 2020
f77d3b8
crawls
ToSBackCrawler Jul 8, 2020
5a4deb5
changes for reviewed docs
ToSBackCrawler Jul 9, 2020
4085ec0
crawls
ToSBackCrawler Jul 9, 2020
2cf69d6
changes for reviewed docs
ToSBackCrawler Jul 10, 2020
a84e6cd
crawls
ToSBackCrawler Jul 10, 2020
01645b3
changes for reviewed docs
ToSBackCrawler Jul 11, 2020
7af5f8b
crawls
ToSBackCrawler Jul 11, 2020
dc18141
crawls
ToSBackCrawler Jul 12, 2020
07c021b
changes for reviewed docs
ToSBackCrawler Jul 13, 2020
71b9f01
crawls
ToSBackCrawler Jul 13, 2020
5376f33
changes for reviewed docs
ToSBackCrawler Jul 14, 2020
bcae016
crawls
ToSBackCrawler Jul 14, 2020
8903de9
changes for reviewed docs
ToSBackCrawler Jul 15, 2020
50979d8
crawls
ToSBackCrawler Jul 15, 2020
edaaa00
changes for reviewed docs
ToSBackCrawler Jul 16, 2020
b0fae77
crawls
ToSBackCrawler Jul 16, 2020
a4b477d
changes for reviewed docs
ToSBackCrawler Jul 17, 2020
0120a14
crawls
ToSBackCrawler Jul 17, 2020
a110ab0
changes for reviewed docs
ToSBackCrawler Jul 18, 2020
f616bde
crawls
ToSBackCrawler Jul 18, 2020
c1bb98e
changes for reviewed docs
ToSBackCrawler Jul 19, 2020
3343bfc
crawls
ToSBackCrawler Jul 19, 2020
aad75cd
changes for reviewed docs
ToSBackCrawler Jul 20, 2020
80c5c8d
crawls
ToSBackCrawler Jul 20, 2020
3de0584
changes for reviewed docs
ToSBackCrawler Jul 21, 2020
146eed7
crawls
ToSBackCrawler Jul 21, 2020
5ad1c45
changes for reviewed docs
ToSBackCrawler Jul 22, 2020
be3acb6
crawls
ToSBackCrawler Jul 22, 2020
c216ca1
changes for reviewed docs
ToSBackCrawler Jul 23, 2020
bf0ca2d
crawls
ToSBackCrawler Jul 23, 2020
822f010
changes for reviewed docs
ToSBackCrawler Jul 24, 2020
4e9eb86
crawls
ToSBackCrawler Jul 24, 2020
b853e6e
changes for reviewed docs
ToSBackCrawler Jul 25, 2020
62ab86c
crawls
ToSBackCrawler Jul 25, 2020
200e871
changes for reviewed docs
ToSBackCrawler Jul 26, 2020
300861c
crawls
ToSBackCrawler Jul 26, 2020
9f64d81
changes for reviewed docs
ToSBackCrawler Jul 27, 2020
ad3e8dc
crawls
ToSBackCrawler Jul 27, 2020
3d47e43
changes for reviewed docs
ToSBackCrawler Jul 28, 2020
5acac7a
crawls
ToSBackCrawler Jul 28, 2020
4e45f43
changes for reviewed docs
ToSBackCrawler Jul 29, 2020
82e82c0
crawls
ToSBackCrawler Jul 29, 2020
2943dba
changes for reviewed docs
ToSBackCrawler Jul 30, 2020
d3ba3dd
crawls
ToSBackCrawler Jul 30, 2020
95dcfa7
changes for reviewed docs
ToSBackCrawler Jul 31, 2020
2cc3f2b
crawls
ToSBackCrawler Jul 31, 2020
a0d56b5
changes for reviewed docs
ToSBackCrawler Aug 1, 2020
5027154
crawls
ToSBackCrawler Aug 1, 2020
255b7f2
changes for reviewed docs
ToSBackCrawler Aug 2, 2020
d99f1f6
crawls
ToSBackCrawler Aug 2, 2020
ef1215c
changes for reviewed docs
ToSBackCrawler Aug 3, 2020
fb370c2
crawls
ToSBackCrawler Aug 3, 2020
11ab9cc
changes for reviewed docs
ToSBackCrawler Aug 4, 2020
a5064be
crawls
ToSBackCrawler Aug 4, 2020
7fb3cff
changes for reviewed docs
ToSBackCrawler Aug 5, 2020
aa14208
crawls
ToSBackCrawler Aug 5, 2020
7fca21d
changes for reviewed docs
ToSBackCrawler Aug 6, 2020
f0cfe1c
crawls
ToSBackCrawler Aug 6, 2020
08d49a7
Move amazon_cam.xml into amazon.com.xml
michielbdejong Aug 6, 2020
dc06cdd
Fix #71
michielbdejong Aug 6, 2020
ee50177
Delete empty and commented-out rule files
michielbdejong Aug 6, 2020
45b7b71
Fix amazon.com.xml XML syntax
michielbdejong Aug 6, 2020
e71bd21
More XML syntax corrections
michielbdejong Aug 6, 2020
4708e4f
changes for reviewed docs
ToSBackCrawler Aug 7, 2020
d582f17
crawls
ToSBackCrawler Aug 7, 2020
9c258d7
Comply with https://github.com/ambanum/CGUs/pull/88
michielbdejong Aug 7, 2020
940c8d2
Some more doc type changes
michielbdejong Aug 7, 2020
dd12d2c
changes for reviewed docs
ToSBackCrawler Aug 8, 2020
3a75982
crawls
ToSBackCrawler Aug 8, 2020
234f747
changes for reviewed docs
ToSBackCrawler Aug 9, 2020
fc3ba80
crawls
ToSBackCrawler Aug 9, 2020
4c1a108
changes for reviewed docs
ToSBackCrawler Aug 10, 2020
d2dee4b
crawls
ToSBackCrawler Aug 10, 2020
9f758b1
changes for reviewed docs
ToSBackCrawler Aug 11, 2020
cbac248
crawls
ToSBackCrawler Aug 11, 2020
e3568a0
changes for reviewed docs
ToSBackCrawler Aug 12, 2020
36f916f
crawls
ToSBackCrawler Aug 12, 2020
725a923
changes for reviewed docs
ToSBackCrawler Aug 13, 2020
9fd702c
crawls
ToSBackCrawler Aug 13, 2020
9556c8b
changes for reviewed docs
ToSBackCrawler Aug 14, 2020
da38455
crawls
ToSBackCrawler Aug 14, 2020
8795be5
changes for reviewed docs
ToSBackCrawler Aug 15, 2020
dfee579
crawls
ToSBackCrawler Aug 15, 2020
00e5841
changes for reviewed docs
ToSBackCrawler Aug 16, 2020
633ec0b
crawls
ToSBackCrawler Aug 16, 2020
752eca3
changes for reviewed docs
ToSBackCrawler Aug 17, 2020
5223ab7
crawls
ToSBackCrawler Aug 17, 2020
5267c6b
changes for reviewed docs
ToSBackCrawler Aug 18, 2020
f7b53af
crawls
ToSBackCrawler Aug 18, 2020
af08a56
changes for reviewed docs
ToSBackCrawler Aug 19, 2020
93260e1
crawls
ToSBackCrawler Aug 19, 2020
3bd9a90
changes for reviewed docs
ToSBackCrawler Aug 20, 2020
12c2bd3
crawls
ToSBackCrawler Aug 20, 2020
53a0b6a
changes for reviewed docs
ToSBackCrawler Aug 21, 2020
5767035
crawls
ToSBackCrawler Aug 21, 2020
55ca163
changes for reviewed docs
ToSBackCrawler Aug 22, 2020
94fea6c
crawls
ToSBackCrawler Aug 22, 2020
1e5a98f
changes for reviewed docs
ToSBackCrawler Aug 23, 2020
f5668cb
crawls
ToSBackCrawler Aug 23, 2020
9b5623e
changes for reviewed docs
ToSBackCrawler Aug 24, 2020
dbda333
crawls
ToSBackCrawler Aug 24, 2020
4a19bcf
changes for reviewed docs
ToSBackCrawler Aug 25, 2020
b482033
crawls
ToSBackCrawler Aug 25, 2020
d28a220
changes for reviewed docs
ToSBackCrawler Aug 26, 2020
4a4a7e2
crawls
ToSBackCrawler Aug 26, 2020
7e12976
changes for reviewed docs
ToSBackCrawler Aug 27, 2020
d2068cb
crawls
ToSBackCrawler Aug 27, 2020
cd2ee69
changes for reviewed docs
ToSBackCrawler Aug 28, 2020
b10b738
crawls
ToSBackCrawler Aug 28, 2020
2786b85
add mysql connection back
JimmStout Aug 28, 2020
f060947
changes for reviewed docs
ToSBackCrawler Aug 29, 2020
7a3e6ce
crawls
ToSBackCrawler Aug 29, 2020
c16a276
changes for reviewed docs
ToSBackCrawler Aug 30, 2020
5fd8014
crawls
ToSBackCrawler Aug 30, 2020
5881bf7
changes for reviewed docs
ToSBackCrawler Aug 31, 2020
8fb7f07
crawls
ToSBackCrawler Aug 31, 2020
d9ab7f2
changes for reviewed docs
ToSBackCrawler Sep 1, 2020
9159455
crawls
ToSBackCrawler Sep 1, 2020
b66e08e
changes for reviewed docs
ToSBackCrawler Sep 2, 2020
d9e5e77
crawls
ToSBackCrawler Sep 2, 2020
ea2d848
changes for reviewed docs
ToSBackCrawler Sep 3, 2020
dcc643d
crawls
ToSBackCrawler Sep 3, 2020
382e871
update bundler
ToSBackCrawler Sep 3, 2020
2b15843
changes for reviewed docs
ToSBackCrawler Sep 4, 2020
665c3bf
crawls
ToSBackCrawler Sep 4, 2020
a34251e
changes for reviewed docs
ToSBackCrawler Sep 5, 2020
0a86eda
crawls
ToSBackCrawler Sep 5, 2020
24dda7e
crawls
ToSBackCrawler Sep 6, 2020
fca0c07
crawls
ToSBackCrawler Sep 7, 2020
a66e2e7
changes for reviewed docs
ToSBackCrawler Sep 8, 2020
ca1fef2
crawls
ToSBackCrawler Sep 8, 2020
773dfbd
changes for reviewed docs
ToSBackCrawler Sep 9, 2020
cae7264
crawls
ToSBackCrawler Sep 9, 2020
889829b
changes for reviewed docs
ToSBackCrawler Sep 10, 2020
8a630af
crawls
ToSBackCrawler Sep 10, 2020
55e94a5
changes for reviewed docs
ToSBackCrawler Sep 11, 2020
e2f7308
crawls
ToSBackCrawler Sep 11, 2020
e741e7d
changes for reviewed docs
ToSBackCrawler Sep 12, 2020
fe22a99
crawls
ToSBackCrawler Sep 12, 2020
a5cc81c
changes for reviewed docs
ToSBackCrawler Sep 13, 2020
a22962a
crawls
ToSBackCrawler Sep 13, 2020
e1d15e7
changes for reviewed docs
ToSBackCrawler Sep 14, 2020
7531672
crawls
ToSBackCrawler Sep 14, 2020
cad84a5
changes for reviewed docs
ToSBackCrawler Sep 15, 2020
290eaf0
crawls
ToSBackCrawler Sep 15, 2020
89e7bdb
changes for reviewed docs
ToSBackCrawler Sep 16, 2020
ad045e9
crawls
ToSBackCrawler Sep 16, 2020
9c5a9f9
changes for reviewed docs
ToSBackCrawler Sep 17, 2020
53ebe2b
crawls
ToSBackCrawler Sep 17, 2020
f3d7a5c
changes for reviewed docs
ToSBackCrawler Sep 18, 2020
8e10d01
crawls
ToSBackCrawler Sep 18, 2020
78d4c7a
changes for reviewed docs
ToSBackCrawler Sep 19, 2020
de71028
crawls
ToSBackCrawler Sep 19, 2020
9abbda3
changes for reviewed docs
ToSBackCrawler Sep 20, 2020
725b6d4
crawls
ToSBackCrawler Sep 20, 2020
7192630
changes for reviewed docs
ToSBackCrawler Sep 21, 2020
0f4264e
crawls
ToSBackCrawler Sep 21, 2020
eb7a252
changes for reviewed docs
ToSBackCrawler Sep 22, 2020
cd914cf
crawls
ToSBackCrawler Sep 22, 2020
14562b8
changes for reviewed docs
ToSBackCrawler Sep 23, 2020
9c7500a
crawls
ToSBackCrawler Sep 23, 2020
d5dc113
changes for reviewed docs
ToSBackCrawler Sep 24, 2020
7f88b6f
crawls
ToSBackCrawler Sep 24, 2020
e702ee6
changes for reviewed docs
ToSBackCrawler Sep 25, 2020
b2ea955
crawls
ToSBackCrawler Sep 25, 2020
af73363
changes for reviewed docs
ToSBackCrawler Sep 26, 2020
34b3486
crawls
ToSBackCrawler Sep 26, 2020
025cc00
crawls
ToSBackCrawler Sep 27, 2020
5337017
crawls
ToSBackCrawler Sep 28, 2020
eeb47c0
crawls
ToSBackCrawler Sep 29, 2020
68ccd44
changes for reviewed docs
ToSBackCrawler Sep 30, 2020
d33782f
crawls
ToSBackCrawler Sep 30, 2020
2441c78
changes for reviewed docs
ToSBackCrawler Oct 1, 2020
433524f
crawls
ToSBackCrawler Oct 1, 2020
b38a88e
changes for reviewed docs
ToSBackCrawler Oct 2, 2020
9a8aeb7
crawls
ToSBackCrawler Oct 2, 2020
a17835a
changes for reviewed docs
ToSBackCrawler Oct 3, 2020
3b878a0
crawls
ToSBackCrawler Oct 3, 2020
7563ba7
crawls
ToSBackCrawler Oct 4, 2020
b62a173
crawls
ToSBackCrawler Oct 5, 2020
0bc8d73
changes for reviewed docs
ToSBackCrawler Oct 6, 2020
7ee395d
crawls
ToSBackCrawler Oct 6, 2020
9e04d09
changes for reviewed docs
ToSBackCrawler Oct 7, 2020
76f4cd0
crawls
ToSBackCrawler Oct 7, 2020
461b149
Merge remote-tracking branch 'origin/fix-71'
michielbdejong Oct 7, 2020
45b66f3
changes for reviewed docs
ToSBackCrawler Oct 7, 2020
3900b00
crawls
ToSBackCrawler Oct 7, 2020
7828428
Merge branch 'master' of github.com:tosdr/tosback2
ToSBackCrawler Oct 7, 2020
511f0d8
changes for reviewed docs
ToSBackCrawler Oct 8, 2020
7ff10f8
crawls
ToSBackCrawler Oct 8, 2020
fa4d841
changes for reviewed docs
ToSBackCrawler Oct 9, 2020
3370eb0
crawls
ToSBackCrawler Oct 9, 2020
aec27da
changes for reviewed docs
ToSBackCrawler Oct 10, 2020
c5f73c8
crawls
ToSBackCrawler Oct 10, 2020
207c140
changes for reviewed docs
ToSBackCrawler Oct 11, 2020
bda782f
crawls
ToSBackCrawler Oct 11, 2020
1ec0816
crawls
ToSBackCrawler Oct 12, 2020
22dba2d
changes for reviewed docs
ToSBackCrawler Oct 13, 2020
81b35e5
crawls
ToSBackCrawler Oct 13, 2020
368c16f
crawls
ToSBackCrawler Oct 14, 2020
40504a3
changes for reviewed docs
ToSBackCrawler Oct 15, 2020
cae2810
crawls
ToSBackCrawler Oct 15, 2020
579a560
changes for reviewed docs
ToSBackCrawler Oct 16, 2020
27143e2
crawls
ToSBackCrawler Oct 16, 2020
e0c57bf
changes for reviewed docs
ToSBackCrawler Oct 17, 2020
89dfd95
crawls
ToSBackCrawler Oct 17, 2020
22cfee3
changes for reviewed docs
ToSBackCrawler Oct 18, 2020
dd089e9
crawls
ToSBackCrawler Oct 18, 2020
1ac5393
crawls
ToSBackCrawler Oct 19, 2020
19f92c5
changes for reviewed docs
ToSBackCrawler Oct 20, 2020
98dc35d
crawls
ToSBackCrawler Oct 20, 2020
724ead4
crawls
ToSBackCrawler Oct 21, 2020
83d2937
changes for reviewed docs
ToSBackCrawler Oct 22, 2020
930af8e
crawls
ToSBackCrawler Oct 22, 2020
1408f7e
changes for reviewed docs
ToSBackCrawler Oct 23, 2020
03c4e67
crawls
ToSBackCrawler Oct 23, 2020
c0de0dc
changes for reviewed docs
ToSBackCrawler Oct 24, 2020
b575272
crawls
ToSBackCrawler Oct 24, 2020
730f1bf
crawls
ToSBackCrawler Oct 25, 2020
18fae80
changes for reviewed docs
ToSBackCrawler Oct 26, 2020
083d5de
crawls
ToSBackCrawler Oct 26, 2020
3972f4e
changes for reviewed docs
ToSBackCrawler Oct 27, 2020
dc97196
crawls
ToSBackCrawler Oct 27, 2020
da942bc
changes for reviewed docs
ToSBackCrawler Oct 28, 2020
d3f7a6e
crawls
ToSBackCrawler Oct 28, 2020
5341c61
changes for reviewed docs
ToSBackCrawler Oct 29, 2020
9b490d0
crawls
ToSBackCrawler Oct 29, 2020
160bd13
changes for reviewed docs
ToSBackCrawler Oct 30, 2020
13056b0
crawls
ToSBackCrawler Oct 30, 2020
263b24a
crawls
ToSBackCrawler Oct 31, 2020
5d14018
crawls
ToSBackCrawler Nov 1, 2020
7fd4982
changes for reviewed docs
ToSBackCrawler Nov 2, 2020
5e0d022
crawls
ToSBackCrawler Nov 2, 2020
59509b7
crawls
ToSBackCrawler Nov 3, 2020
8e1a780
crawls
ToSBackCrawler Nov 4, 2020
f16ea31
changes for reviewed docs
ToSBackCrawler Nov 5, 2020
918924a
changes for reviewed docs
ToSBackCrawler Nov 20, 2020
d7408a9
crawls
ToSBackCrawler Nov 20, 2020
01a1939
changes for reviewed docs
ToSBackCrawler Nov 21, 2020
40ad02e
crawls
ToSBackCrawler Nov 21, 2020
938218f
changes for reviewed docs
ToSBackCrawler Nov 22, 2020
dbb7aec
crawls
ToSBackCrawler Nov 22, 2020
6c6bf41
crawls
ToSBackCrawler Nov 23, 2020
2202f49
changes for reviewed docs
ToSBackCrawler Nov 24, 2020
c80eb7d
crawls
ToSBackCrawler Nov 24, 2020
a2fc2be
crawls
ToSBackCrawler Nov 25, 2020
6540088
changes for reviewed docs
ToSBackCrawler Nov 26, 2020
fefd60a
crawls
ToSBackCrawler Nov 26, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# OS generated files #
######################
.DS_Store
rubycode/lib/tosback_secrets.rb

# Ignore bundler config
/.bundle
17 changes: 17 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# A sample Gemfile
source "https://rubygems.org"

#ruby '1.9.3'
ruby '2.3.1'

# gem "rails"
gem "capybara"
gem "poltergeist"
gem "nokogiri", "~> 1.6.1"
gem "mechanize", "~> 2.6.0"
gem "sanitize", "~> 2.1.0"
gem "mail", "~> 2.5.4"
gem "activerecord", "~> 4.0.0"
group :production do
gem "mysql2", "~> 0.3.13"
end
99 changes: 99 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
GEM
remote: https://rubygems.org/
specs:
activemodel (4.0.4)
activesupport (= 4.0.4)
builder (~> 3.1.0)
activerecord (4.0.4)
activemodel (= 4.0.4)
activerecord-deprecated_finders (~> 1.0.2)
activesupport (= 4.0.4)
arel (~> 4.0.0)
activerecord-deprecated_finders (1.0.3)
activesupport (4.0.4)
i18n (~> 0.6, >= 0.6.9)
minitest (~> 4.2)
multi_json (~> 1.3)
thread_safe (~> 0.1)
tzinfo (~> 0.3.37)
addressable (2.5.1)
public_suffix (~> 2.0, >= 2.0.2)
arel (4.0.2)
builder (3.1.4)
capybara (2.15.1)
addressable
mini_mime (>= 0.1.3)
nokogiri (>= 1.3.3)
rack (>= 1.0.0)
rack-test (>= 0.5.4)
xpath (~> 2.0)
cliver (0.3.2)
domain_name (0.5.18)
unf (>= 0.0.5, < 1.0.0)
i18n (0.6.9)
mail (2.5.4)
mime-types (~> 1.16)
treetop (~> 1.4.8)
mechanize (2.6.0)
domain_name (~> 0.5, >= 0.5.1)
mime-types (~> 1.17, >= 1.17.2)
net-http-digest_auth (~> 1.1, >= 1.1.1)
net-http-persistent (~> 2.5, >= 2.5.2)
nokogiri (~> 1.4)
ntlm-http (~> 0.1, >= 0.1.1)
webrobots (>= 0.0.9, < 0.2)
mime-types (1.25.1)
mini_mime (0.1.3)
mini_portile (0.5.3)
minitest (4.7.5)
multi_json (1.9.3)
mysql2 (0.3.21)
net-http-digest_auth (1.4)
net-http-persistent (2.9.4)
nokogiri (1.6.1)
mini_portile (~> 0.5.0)
ntlm-http (0.1.1)
poltergeist (1.16.0)
capybara (~> 2.1)
cliver (~> 0.3.1)
websocket-driver (>= 0.2.0)
polyglot (0.3.4)
public_suffix (2.0.5)
rack (2.0.3)
rack-test (0.7.0)
rack (>= 1.0, < 3)
sanitize (2.1.0)
nokogiri (>= 1.4.4)
thread_safe (0.3.3)
treetop (1.4.15)
polyglot
polyglot (>= 0.3.1)
tzinfo (0.3.39)
unf (0.1.4)
unf_ext
unf_ext (0.0.6)
webrobots (0.1.1)
websocket-driver (0.6.5)
websocket-extensions (>= 0.1.0)
websocket-extensions (0.1.2)
xpath (2.1.0)
nokogiri (~> 1.3)

PLATFORMS
ruby

DEPENDENCIES
activerecord (~> 4.0.0)
capybara
mail (~> 2.5.4)
mechanize (~> 2.6.0)
mysql2 (~> 0.3.13)
nokogiri (~> 1.6.1)
poltergeist
sanitize (~> 2.1.0)

RUBY VERSION
ruby 2.3.1p112

BUNDLED WITH
2.1.4
38 changes: 0 additions & 38 deletions README

This file was deleted.

39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# ToSBack!

This is a ruby implementation of TOSBack! Designed to scrape the Privacy Policies and Terms of Service agreements from sites defined in the rules folder.

## Rules

The log files in "logs" should give info on when the script was last run, and if one of the rule's URLs needs to be updated. Typically, tosback.rb will grab the body of a URL and try to strip away the html before storing the policy, but if a site is coming back as modified every time the script runs (thanks to ads or related links changing), you can now add an xpath attribute to the url in the xml data to pinpoint the TOS data on the page:

Here's an example:

<docname name="Privacy Policy">
<url name="http://www.500px.com/privacy" xpath="//div[@id='terms']">
<norecurse name="arbitrary"/>
</url>
</docname>

Now, tosback.rb should only grab the content we want from that URL! Hooray!

## Developing

This project requires ruby `2.3.1` and `phantomjs`.

After cloning the project, use the `--without production` option to install the required gems:

`$ bundle install --without production`

When the app runs without any options, it saves information to our database and automatically makes some new git commits, but this is probably only desirable in production. On your dev machine, run it like this to skip the db and auto-committing:

`rubycode$ ruby main.rb -dev`

You can also pass a rule file as an argument to the script to get a preview of the results! For example:

`rubycode$ ruby main.rb ../rules/abercrombie.com.xml`

This will only scrape and write the rule you pass, so you can add xpath data to a rule and quickly test to make sure it's correct.

Running with the "-empty" argument will scan the crawl directory and update the empty.log! Example:

`rubycode$ ruby main.rb -empty`
Loading