:flashlight: QRMine

/ˈkärmīn/

forthebadge made-with-pythonPyPI download total Libraries.io SourceRank GitHub tag (latest by date) Documentation

QRMine is a suite of qualitative research (QR) data mining tools in Python using Natural Language Processing (NLP) and Machine Learning (ML). QRMine is work in progress. Read More..

What it does

NLP

  • Lists common categories for open coding.

  • Create a coding dictionary with categories, properties and dimensions.

  • Topic modelling.

  • Arrange docs according to topics.

  • Compare two documents/interviews.

  • Select documents/interviews by sentiment, category or title for further analysis.

  • Sentiment analysis

ML

  • Accuracy of a neural network model trained using the data

  • Confusion matrix from an support vector machine classifier

  • K nearest neighbours of a given record

  • K-Means clustering

  • Principal Component Analysis (PCA)

  • Association rules

How to install

pip install qrmine
python -m spacy download en_core_web_sm

Mac users

  • Mac users, please install libomp for XGBoost

brew install libomp

How to Use

  • input files are transcripts as txt files and a single csv file with numeric data. The output txt file can be specified.

  • The coding dictionary, topics and topic assignments can be created from the entire corpus (all documents) using the respective command line options.

  • Categories (concepts), summary and sentiment can be viewed for entire corpus or specific titles (documents) specified using the –titles switch. Sentence level sentiment output is possible with the –sentence flag.

  • You can filter documents based on sentiment, titles or categories and do further analysis, using –filters or -f

  • Many of the ML functions like neural network takes a second argument (-n) . In nnet -n signifies the number of epochs, number of clusters in kmeans, number of factors in pca, and number of neighbours in KNN. KNN also takes the –rec or -r argument to specify the record.

  • Variables from csv can be selected using –titles (defaults to all). The first variable will be ignored (index) and the last will be the DV (dependant variable).

Command-line options

qrmine --help

| Command | Alternate | Description | | — | — | — | | –inp | -i | Input file in the text format with Topic | | –out | -o | Output file name | | –csv | | csv file name | | –num | -n | N (clusters/epochs etc depending on context) | | –rec | -r | Record (based on context) | | –titles | -t | Document(s) title(s) to analyze/compare | | –codedict | | Generate coding dictionary | | –topics | | Generate topic model | | –assign | | Assign documents to topics | | –cat | | List categories of entire corpus or individual docs | | –summary | | Generate summary for entire corpus or individual docs | | –sentiment | | Generate sentiment score for entire corpus or individual docs | | –nlp | | Generate all NLP reports | | –sentence | | Generate sentence level scores when applicable | | –nnet | | Display accuracy of a neural network model -n epochs(3)| | –svm | | Display confusion matrix from an svm classifier | | –knn | | Display nearest neighbours -n neighbours (3)| | –kmeans | | Display KMeans clusters -n clusters (3)| | –cart | | Display Association Rules | | –pca | | Display PCA -n factors (3)|

Use it in your code

from qrmine import Content
from qrmine import Network
from qrmine import Qrmine
from qrmine import ReadData
from qrmine import Sentiment
from qrmine import MLQRMine
  • More instructions and a jupyter notebook available here.

Input file format

NLP

Individual documents or interview transcripts in a single text file separated by Topic. Example below

Transcript of the first interview with John.
Any number of lines
<break>First_Interview_John</break>

Text of the second interview with Jane.
More text.
<break>Second_Interview_Jane</break>

....

Multiple files are suported, each having only one break tag at the bottom with the topic. (The tag may be renamed in the future)

ML

A single csv file with the following generic structure.

  • Column 1 with identifier. If it is related to a text document as above, include the title.

  • Last column has the dependent variable (DV). (NLP algorithms like the topic asignments may provide the DV)

  • All independent variables (numerical) in between.

index, obesity, bmi, exercise, income, bp, fbs, has_diabetes
1, 0, 29, 1, 12, 120, 89, 1
2, 1, 32, 0, 9, 140, 92, 0
......

Author

Citation

Please cite QRMine in your publications if it helped your research. Here is an example BibTeX entry (Read paper on arXiv):


@article{eapenbr2019qrmine,
  title={QRMine: A python package for triangulation in Grounded Theory},
  author={Eapen, Bell Raj and Archer, Norm and Sartpi, Kamran},
  journal={arXiv preprint arXiv:2003.13519 },
  year={2020}
}

QRMine is inspired by this work and the associated paper.

Give us a star ⭐️

If you find this project useful, give us a star. It helps others discover the project.

Demo

QRMine