- my research is as much about information science as literary criticism. I’m especially interested in applying machine learning to large digital collections
- Git repo with code for upcoming book: Distant Horizons: Digital Evidence and Literary Change
- Do topic models warp time?
- The key observation I wanted to share is just that topic models produce a kind of curved space when applied to long timelines; if you’re measuring distances between individual topic distributions, it may not be safe to assume that your yardstick means the same thing at every point in time. This is not a reason for despair: there are lots of good ways to address the distortion. The mathematics of cosine distance tend to work better if you average the documents first, and then measure the cosine between the averages (or “centroids”).
- The Historical Significance of Textual Distances
- Measuring similarity is a basic task in information retrieval, and now often a building-block for more complex arguments about cultural change. But do measures of textual similarity and distance really correspond to evidence about cultural proximity and differentiation? To explore that question empirically, this paper compares textual and social measures of the similarities between genres of English-language fiction. Existing measures of textual similarity (cosine similarity on tf-idf vectors or topic vectors) are also compared to new strategies that use supervised learning to anchor textual measurement in a social context.
7:00 – 8:00 ASRC MKT
- Continued on slides. I think I have the basics. Need to start looking for pictures
- Sent response to the SASO folks about who’s presenting what.
9:00 – ASRC IRAD
- More flailing on A2P UI? Oh, yeah….
- More RNN/LSTM?
- Slow progress. Spent some time cleaning up my single neuron spreadsheet, which does make more sense now.
- Some papers and source pointed to by the text: