Phil 7.7.20

The opportunity cost of this is going to be so steep. I wonder what country will set up an effective, open, online university?

f1

GPT-2 Agents

  • Working through the texthero examples. Spent a lot of time figuring out how to print elements from a row in a Dataframe, which was ridiculously hard. Instead, I just turned it into a dict and worked with that
    # print the first n rows of a dataframe using the specified columns. Use a -1 for printing all rows 
    def print_df(df:pd.DataFrame, headers:List, num_rows:int = 4, max_chars:int = 80):
        s:pd.Series
        rows = 0
    
        d:Dict = df.to_dict('index')
        rd:Dict
        for index, rd in d.items():
            st = ""
            keys = rd.keys()
            for key in headers:
                if key in keys:
                    val = rd[key]
                    st += "{}: {}, ".format(key, val[:max_chars])
            print(st)
            rows += 1
            if num_rows != -1 and rows > num_rows:
                break
  • The scatterplot appears to use plotly, since it’s presented in the browser. That’s kind of cool, since it implies that the plotting functions of plotly are free somehow? After going to the plotly.com website, I see that “Plotly.py is free and open source and you can view the source, report issues or contribute on GitHub.” That would be worth digging into some more then. Here’s the PCA plot:

pca

  • You can make word clouds easily, too

WordCloud

GOES

  • Finish training? Ooops, forgot
  • Some discussion with Vadim about the structure of the control

ML Seminar

  • Good discussion on topic extraction over time. Basically, create k topics from the entire corpora. Each topic is a ranking of all the words in the corpus. Behavior over time is the amount of the top words from each topic k in each time sample t.