Consilience (with Merce Crosas, Gary King and Brandon Stewart). In Progress.
Consilience implements a general purpose clustering methodology to facilitate discovery in large collections of texts and other data, introduced here. We also introduce a general collection of cluster analysis algorithms, including implementations of spectral methods, nonparametric and parametric Bayesian methods, affinity propagation, and coclustering methods. Extensions include cluster generation without clustering and scaling to arbitrary groups of documents.
Patent: Method and Apparatus for Selecting Clusterings to Classify a Predetermined Data Set. (Patent Number US 8,438,162 B2)
expAgenda provides a suite of methods that allow for the inclusion of additional information in unsupervised learning models for text and other data. The package includes a model that estimates authors' expressed priorities in texts, used in my first book and introduced here . Extensions allow the inclusion of covariates and a second level of clustering to estimate a typology of expressed priorities. Methods for modeling legislative voting blocs and nonparametric topic models, introduced here are also included. Code available upon request.
time series models in Zelig (Imai, King, Lau, 2007)