Phil 5.2.18

7:00 – 4:30 ASRC MKT

    • I am going to start calling runaway echo chambers Baudrillardian Stampedes: https://en.wikipedia.org/wiki/Simulacra_and_Simulation
    • GECCO 2018 paper list is full of swarming optimizers
    • CORNELL NEWSROOM is a large dataset for training and evaluating summarization systems. It contains 1.3 million articles and summaries written by authors and editors in the newsrooms of 38 major publications. The summaries are obtained from search and social metadata between 1998 and 2017 and use a variety of summarization strategies combining extraction and abstraction.
    • More Ultimate Angular
      • Template Fundamentals (interpolation – #ref)
    • Now that I have my corpora, time to figure out how to build an embedding
    • Installing gensim
      • By now, gensim is—to my knowledge—the most robust, efficient and hassle-free piece of software to realize unsupervised semantic modelling from plain text. It stands in contrast to brittle homework-assignment-implementations that do not scale on one hand, and robust java-esque projects that take forever just to run “hello world”.
      • Big install. Didn’t break TF, which is nice
    • How to Develop Word Embeddings in Python with Gensim
      • Following the tutorial. Here’s a plot! W2V
    • I need to redo the parser so that each file is one sentence.
      • sentences are strings that begin with a [CR] or [SPACE] + [WORD] and end with [WORD] + [.] or [“]
      • a [CR] preceded by anything other than a [.] or [“] is the middle of  a sentance
      • A fantastic regex tool! https://regex101.com/
        • regex = re.compile(r"([-!?\.]\"|[!?\.])")
      • After running into odd edge cases, I decided to load each book as a single string, parse it, then write out the individual lines. Works great except the last step, where I can’t seem to iterate over an array of strings. Calling it a day

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.