Monthly Archives: March 2022

Phil 3.12.2022

Need to remember this!

HATHI 1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust

  • We present a new dataset built on prior work consisting of 1,671,370 randomly sampled pages of English-language prose roughly divided between modes of fictional and non-fictional writing and published between the years 1800 and 2000. In addition to focusing on the “page’’ as the basic bibliographic unit, our work employs a single predictive model for the historical period under consideration in contrast to prior work. Besides publication metadata, we also provide an enriched feature set of 107 features including part-of-speech tags, sentiment scores, word supersenses and more. Our data is designed to give researchers in the digital humanities large yet portable random samples of historical writing across two foundational modes of English prose writing. We present initial insights into transformations of linguistic patterns across this historical period using our enriched features as possible pointers to future work. The data can be accessed at https://doi.org/10.7910/DVN/HAKKUA.

Rhinocéros! presents a small town overrun with radical ideas, clashing ideology and not so subtle transformations. When Beringer, a local drunk, finds himself surrounded by neighbors who are slowly turning into giant beasts, he’s forced to navigate a new world where the rights of citizens are changing as rapidly as the body of the mob around him.

Phil 3.10.2022

https://twitter.com/marktenenholtz/status/1501905740813848582

Book

  • Scanned content from Social Dominance and Hierarchy in the Forest

SBIRs

  • 9:15 Standup – done
  • Demo slides – done
  • Contract kickoff
  • Data science tagup delayed
  • Possible meeting in Moorestown next week?

GPT Agents

  • Make a base App class that has file loading, terminate, and implement_me callbacks – done
  • Create the Google app – lacked the will to do this
  • Modify the Wiki, GPT, and Twitter apps to use the base class – done

Phil 3.9.2022

Book

  • More on the deep bias chapter
  • Realized that Hofstede’s cultural dimensions are evenly split between nomad/stampede and dominance/parity

SBIRs

  • IRAD Monthly meeting
  • Meeting with Steve
  • Created the RCSNN Github repo

GPT Agents

  • Going to make a set of small apps that we can more directly compare GPT, Wikipedia, and Google search. Got a basic Google Custom Search Engine running. Here’s the output for “slang for COVID-19”
Gen Z Slang for the Coronavirus Pandemic: Miss Rona, Coronacation:
	link = www.businessinsider.com
	snippet = Apr 8, 2020 ... Miss Rona / The Rona — An abbreviation for the coronavirus. Some have called it "Miss Rona," adding the "Miss" to denote personality and "sass" ...

Decoding coronavirus slang, from quarantinis to magpies, covidiots ...:
	link = news.google.com
	snippet = Jun 13, 2020 ... Coronavirus slang · Magpie — to snatch up desirable staples in the supermarket, like toilet paper or pasta. · Covidiot — An insult for someone who ...

New Words We Created Because Of Coronavirus - Dictionary.com:
	link = www.dictionary.com
	snippet = Sep 15, 2020 ... covidiot. A blend of COVID-19 and idiot, covidiot is a slang insult for someone who disregards healthy and safety guidelines about the novel ...

Covid-19 Phrases and Slang That Are Now Commonplace ...:
	link = blog.cheapism.com
	snippet = Jan 7, 2022 ... Another new entry in the Merriam-Webster dictionary, this term refers to those with COVID-19 who are highly contagious and capable of ...

Phil 3.8.2022

Kubric: A scalable dataset generator

  • Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential to address these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent or mitigate problems regarding bias, privacy and licensing. Unfortunately, software tools for effective data generation are less mature than those for architecture design and training, which leads to fragmented generation efforts. To address these problems we introduce Kubric, an open-source Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines, and generating TBs of data. We demonstrate the effectiveness of Kubric by presenting a series of 13 different generated datasets for tasks ranging from studying 3D NeRF models to optical flow estimation. We release Kubric, the used assets, all of the generation code, as well as the rendered datasets for reuse and modification.

Tasks

  • Please call 1-888-692-4560 to arrange an appointment
  • Lights – done!
  • Outlaw – pinged for today
  • Physical
  • Lawn – done

SBIRs

  • 12:00 SBIR kickoff review
  • 1:30 Standup

GPT Agents

  • 3:30 UMBC meeting

Phil 3.7.2022

GPT Agents

  • Fix select on the Wiki App – done
  • Add SharedObjects to load from file or environment variable – done
  • Add documentation

SBIRs

  • Prep slides for IRAD – done
  • Prep slides for MDA – started
  • Meeting with Ron – done

Book

  • More deep bias

3.5.2022

SBIRs

  • Wound up helping out Dave on the technical section for about 2 hours and Val for about an hour
  • Went to see Aaron, who is doing better. Going to keep him out of meetings for at least a few weeks. Also, it seems that he already did the slides for Thursday?
  • Maybe get T moved over?

Phil 3.4.2022

Book

  • Working on cruelty

SBIRs

  • 10:00 Meeting with James – done
  • Write a bunch of stories – done

GPT Agents

  • Finish Wiki tool? Getting there! Done!
Note the big green spike in December!

Phil 3.3.2022

Book

  • Spent a good deal of time researching when “cruelty is the point” started. Google’s daterange search was not much help, but Twitter has good search features in advanced search. Going to integrate them into the tool

GPT Agents

  • Wikipedia tool
  • Add launching of Twitter pages with search terms and date ranges

SBIRs

  • 9:30 Standup
  • 10:00 Meeting with Orest
  • 10:30 Meeting with Rukan
  • 11:30 Architecture meeting
  • 1:00 Phase II intro meeting
  • 2:00 CSC followup
  • 3:00 Meeting with Carmine

Phil 3.2.2022

Not sure if this is true, but it wouldn’t surprise me

Tasks

  • Ping Outlaw – done
  • Physical
  • Lawn

GPT Agents

First test

SBIRs

  • IP Doc – done
  • RCSNN Github
  • Timesheet crap – done
  • Chat with Ron about his student’s project
  • Gotta read ANOTHER SBIR by 10:00 tomorrow

Book

  • Worked on deep bias for causing harm