Skip to content

bobanj/article-categorizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Guesses the category of a given article, depending from previously parsed articles
Decission is determined from parsed articles  from www.a1.com.mk

Instalation:

Get the project
  	 git clone git://github.com/bobanj/article_categorizer_mk.git
Create your database.yml into the config dir.
Install needed libs for Nokogiri
 Linux:
    install these libs: libxslt1-dev libxml2-dev
    install gem with: gem install nokogiri
 Windows:
    install with: gem install nokogiri -s http://tenderlovemaking.com/
Install the other gems by running:
    rake gems:install
Prepare the database:
    rake db:create
    rake db:migrate
    rake db:seed;
Get the articles:
    First make sure that http://www.a1.com.mk is alive :)
    You can run one rake process to get all the articles if you dont mind waiting
    the rake task to parse all the articles with:
        rake media:get_articles #(better use start and count params because the training is done
                                # after getting the articles - due to performance)
    Or you can run multiple rake tasks to get all the data faster by setting a start and cont parameters
        rake media:get_articles start=50 count=10000
        where start is the newsid from a1 and count is the number of articles that will be parsed
        so you can run multiple rake tasks to get more articles at the same time:
        rake media:get_articles start=10000 count=9999
        rake media:get_articles start=20000 count=9999
        rake media:get_articles start=30000 count=9999
        (you get the point)
    Dont worry if there is an overlap, one article cant be inserted multiple times

    After getting all the articles you need to start the training:
Training
    rake media:train_all #Please be patient
After the training is finished you can set a crone task to get new articles every day and train with them
    rake media:get_articles train=true
Create manualy one account # I'll change this in future commits
    User.create :email => your_email, :password => your_pass, :password_confirmation => your_pass
Start the server


#########
# TO DO #
#########
 - Manage Users ?
 - Write learining in db ?
 - New layout

About

Article Categorizer Using Bayes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published