About

These are scripts written in python to perform statistical analysis on the citation data generated by Webservice. The scripts run multiple tests asserting different hypothesises.

To execute the tests, first setup your development environment:

$ cd [TVQ CLONE PATH]/analytics
$ virtualenv .venv
$ source .venv/bin/activate
# on Windows:
# .\.venv\Scripts\activate
$ pip install -r requirements.txt

All tests are run automatically at the execution of the following command.

python ./lib/run.py ...

You may type the following to see the command help:

python ./lib/run.py --help
usage: run.py [-h] {exe-all,cluster,plot-cluster,t-test,growth-hist,gain-score,pubs-in-clusters,tool-pub,citation-dist} ...
A command-line interface to the scripts implemented for analyzing the TVQ-generated citation data.
positional arguments:
{exe-all,cluster,plot-cluster,t-test,growth-hist,gain-score,pubs-in-clusters,tool-pub,citation-dist}
Commands
exe-all Executes all the scripts in a predefined order.
cluster Clusters publications in repositories stored in CSV files in given input path.
plot-cluster Plot clusters and citation counts in quartiles.
t-test Performs t-test on the citation count of publications.
growth-hist Plots a histogram of the citation count growth.
gain-score Plots gain score.
pubs-in-clusters Plots publications in clusters.
tool-pub Plots tools in publications.
citation-dist Plots citations distribution.
optional arguments:
-h, --help show this help message and exit

To see the argument for each sub command you may use the --help argument; for instance:

python ./lib/run.py exe-all --help
usage: run.py exe-all [-h] [-c CLUSTER_COUNT] [-s SOURCE] [-g] [-d] input
positional arguments:
input Path to directory containing input data.
optional arguments:
-h, --help show this help message and exit
-c CLUSTER_COUNT, --cluster_count CLUSTER_COUNT
Groups data in the given number of clusters. If not provided, the cluster count is determined automatically using the Elbow method.
-s SOURCE, --source SOURCE
Sets the cluster source.
-g, --plot_changes If set, plots changes on clustered citation counts. By default the changes are not plotted.
-d, --plot_density If set, plots probability instead of absolute values. Default is False.