Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperparameters estimation for LDA #193

Open
wants to merge 36 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
1aa7032
+ added optimization.h and .gitignore updated
alex2304 Nov 18, 2017
5e3e23c
[opt] dirichlet_optimizer class, digamma function
alex2304 Nov 19, 2017
b8dbc7d
[opt] minka_fpi method draft
alex2304 Nov 19, 2017
3dc03a8
[opt] optimization.h errors fixed, test without MeTa
alex2304 Nov 20, 2017
eeb9168
[opt] debug output
alex2304 Nov 20, 2017
766754f
Adding optimization.cpp
MakKolts Nov 20, 2017
98c3e7d
Merge branch 'develop' of https://github.com/alex2304/meta into develop
Nov 20, 2017
e9c99df
[opt] classes for methods in dirichlet_prior
alex2304 Nov 29, 2017
b11e704
Merge branch 'develop' of https://github.com/alex2304/meta into develop
Nov 29, 2017
54d7272
Deletion of previous stuff
MakKolts Nov 29, 2017
6585189
Test for dirichlet optimizations
MakKolts Nov 29, 2017
c0a357c
Private/public methods
MakKolts Nov 29, 2017
76d32ae
[opt] test indexes
alex2304 Nov 29, 2017
4ccda58
Interface for methods
MakKolts Nov 29, 2017
248c151
Refactoring of optimization interface
MakKolts Nov 29, 2017
61ece78
[opt] tmp for merge
alex2304 Nov 29, 2017
f979264
Tests for all functions at same time
MakKolts Nov 29, 2017
ba00c86
[opt] + term_ids()
alex2304 Nov 29, 2017
1f13f95
[opt] merged dirichlet_prior
alex2304 Nov 29, 2017
ed475b5
[opt] + first method without testing
alex2304 Nov 30, 2017
4528ec6
[opt] *first method builds
alex2304 Nov 30, 2017
312a485
[opt] * method works
alex2304 Nov 30, 2017
b60cc54
[opt] *first method debugged
alex2304 Nov 30, 2017
0a0851c
[opt] method refactored
alex2304 Nov 30, 2017
d726f70
[opt] + method2
alex2304 Nov 30, 2017
4a6a240
Adding constructors and register for new ranker classes
MakKolts Nov 30, 2017
f55e0de
Merge branch 'develop' of https://github.com/alex2304/meta into develop
MakKolts Nov 30, 2017
bc948ce
Add rankers to factory
MakKolts Nov 30, 2017
78d6d5c
[opt] + benchmark
alex2304 Nov 30, 2017
25d89d1
Merge branch 'develop' of https://github.com/alex2304/meta into develop
alex2304 Nov 30, 2017
5bc6ee6
Minor fix foor output
MakKolts Nov 30, 2017
4f8fa1d
[opt] + dirichlet_opt files
alex2304 Nov 30, 2017
c8ddfbf
[opt] + dirichlet_prior_opt
alex2304 Nov 30, 2017
f7b634a
[opt] + MacKay and Peto method
alex2304 Dec 1, 2017
d4b0a8d
[opt] + comments and docs
alex2304 Dec 3, 2017
001fac6
[opt] - test files
alex2304 Dec 4, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@ data/cranfield
biicode.conf
bii/
bin/
*.pro
*.pro.user
10 changes: 10 additions & 0 deletions include/meta/index/disk_index.h
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,16 @@ class disk_index
*/
std::vector<doc_id> docs() const;

/**
* @return a vector of term_ids that are contained in this index
*/
std::vector<term_id> terms() const;

/**
* @return a vector of term_ids that are contained in the document with d_id
*/
std::vector<term_id> terms(doc_id d_id) const;

/**
* @param d_id The document to search for
* @return the size of the given document (the total number of terms
Expand Down
1 change: 1 addition & 0 deletions include/meta/index/ranker/all.h
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#include "meta/index/ranker/ranker.h"
#include "meta/index/ranker/absolute_discount.h"
#include "meta/index/ranker/dirichlet_prior.h"
#include "meta/index/ranker/dirichlet_prior_opt.h"
#include "meta/index/ranker/jelinek_mercer.h"
#include "meta/index/ranker/lm_ranker.h"
#include "meta/index/ranker/okapi_bm25.h"
Expand Down
171 changes: 89 additions & 82 deletions include/meta/index/ranker/dirichlet_prior.h
Original file line number Diff line number Diff line change
@@ -1,82 +1,89 @@
/**
* @file dirichlet_prior.h
* @author Sean Massung
*
* All files in META are released under the MIT license. For more details,
* consult the file LICENSE in the root of the project.
*/

#ifndef META_DIRICHLET_PRIOR_H_
#define META_DIRICHLET_PRIOR_H_

#include "meta/index/ranker/lm_ranker.h"
#include "meta/index/ranker/ranker_factory.h"

namespace meta
{
namespace index
{

/**
* Implements Bayesian smoothing with a Dirichlet prior.
*
* Required config parameters:
* ~~~toml
* [ranker]
* method = "dirichlet-prior"
* ~~~
*
* Optional config parameters:
* ~~~toml
* mu = 2000.0
* ~~~

*/
class dirichlet_prior : public language_model_ranker
{
public:
/// Identifier for this ranker.
const static util::string_view id;

/// Default value of mu
const static constexpr float default_mu = 2000.0f;

/**
* @param mu
*/
dirichlet_prior(float mu = default_mu);

/**
* Loads a dirichlet_prior ranker from a stream.
* @param in The stream to read from
*/
dirichlet_prior(std::istream& in);

void save(std::ostream& out) const override;

/**
* Calculates the smoothed probability of a term.
* @param sd score_data for the current query
*/
float smoothed_prob(const score_data& sd) const override;

/**
* A document-dependent constant.
* @param sd score_data for the current query
*/
float doc_constant(const score_data& sd) const override;

private:
/// the Dirichlet prior parameter
const float mu_;
};

/**
* Specialization of the factory method used to create dirichlet_prior
* rankers.
*/
template <>
std::unique_ptr<ranker> make_ranker<dirichlet_prior>(const cpptoml::table&);
}
}
#endif
/**
* @file dirichlet_prior.h
* @author Sean Massung
*
* All files in META are released under the MIT license. For more details,
* consult the file LICENSE in the root of the project.
*/

#ifndef META_DIRICHLET_PRIOR_H_
#define META_DIRICHLET_PRIOR_H_

#include "meta/index/ranker/lm_ranker.h"
#include "meta/index/ranker/ranker_factory.h"

namespace meta
{
namespace index
{

/**
* Implements Bayesian smoothing with a Dirichlet prior.
*
* Required config parameters:
* ~~~toml
* [ranker]
* method = "dirichlet-prior"
* ~~~
*
* Optional config parameters:
* ~~~toml
* mu = 2000.0
* ~~~

*/
class dirichlet_prior : public language_model_ranker
{
public:
/// Identifier for this ranker.
const static util::string_view id;

/// Default value of mu
const static constexpr float default_mu = 2000.0f;

/**
* @param mu
*/
dirichlet_prior(float mu = default_mu);

/**
* Loads a dirichlet_prior ranker from a stream.
* @param in The stream to read from
*/
dirichlet_prior(std::istream& in);

void save(std::ostream& out) const override;

/**
* Calculates the smoothed probability of a term.
* @param sd score_data for the current query
*/
float smoothed_prob(const score_data& sd) const override;

/**
* A document-dependent constant.
* @param sd score_data for the current query
*/
float doc_constant(const score_data& sd) const override;

float parameter() const {
return mu_;
}

protected:
/// the Dirichlet prior parameter
// const float mu_;
float mu_;
};


/**
* Specialization of the factory method used to create dirichlet_prior
* rankers.
*/
template <>
std::unique_ptr<ranker> make_ranker<dirichlet_prior>(const cpptoml::table&);

}
}
#endif
Loading