Phil 2.17.21

Today we’re excited to release a big update to the Galaxy visualization, an interactive UMAP plot of graph embeddings of books and articles assigned in the Open Syllabus corpus! (This is using the new v2.5 release of the underlying dataset, which also comes out today.) The Galaxy is an attempt to give a 10,000-meter view of the “co-assignment” patterns in the OS data – basically, which books and articles are assigned together in the same courses. By training node embeddings on the citation graph formed from (syllabus, book/article) edges, we can get really high-quality representations of books and articles that capture the ways in which professional instructors use them in the classroom – the types of courses they’re assigned in, the other books they’re paired with, etc.!viewport/-17.7366/15.2999/-10.5617/8.5290

Working with Antonio on an introduction to the journal

GPT Agents

#to get week number (in range 0-53 ) from date

select count(*) as COUNT, extract(WEEK from  created_at) as WEEK from twitter_root where text like “%chinavirus%” group by extract(WEEK from  created_at) order by extract(WEEK from  created_at);

#to drill down from month to week and week to day for March and April since they have significantly larger volume of data

select count(*) as COUNT, date(created_at) as DATE from twitter_root  where text like “%chinavirus%” and (month(created_at) = 3 or month(created_at) = 4) and year(created_at) = 2020 group by date(created_at) order by date(created_at);

Figured out how to do this:

create or replace  view long_text_view as
select tr.row_id, tr.created_at,
case when extended_tweet_row_id <> 0 then et.full_text else tr.text end as long_text
from twitter_root tr
left join extended_tweet et on tr.extended_tweet_row_id = et.row_id;

This is kind of cool. The chinavirus is most peaky but low volume. The sars-cov-2 is low volume but flatter, and the more common terms are pretty similar with gentler peaks and slower falloff. I want to know what that dip isaround week nine, and the peak at week 5 in the chinavirus plot.



  • 10:00 Meeting