Monthly Archives: March 2022

Phil 3.12.2022

Need to remember this!

HATHI 1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust

We present a new dataset built on prior work consisting of 1,671,370 randomly sampled pages of English-language prose roughly divided between modes of fictional and non-fictional writing and published between the years 1800 and 2000. In addition to focusing on the “page’’ as the basic bibliographic unit, our work employs a single predictive model for the historical period under consideration in contrast to prior work. Besides publication metadata, we also provide an enriched feature set of 107 features including part-of-speech tags, sentiment scores, word supersenses and more. Our data is designed to give researchers in the digital humanities large yet portable random samples of historical writing across two foundational modes of English prose writing. We present initial insights into transformations of linguistic patterns across this historical period using our enriched features as possible pointers to future work. The data can be accessed at https://doi.org/10.7910/DVN/HAKKUA.

Rhinocéros! presents a small town overrun with radical ideas, clashing ideology and not so subtle transformations. When Beringer, a local drunk, finds himself surrounded by neighbors who are slowly turning into giant beasts, he’s forced to navigate a new world where the rights of citizens are changing as rapidly as the body of the mob around him.

Phil 3.10.2022

https://twitter.com/marktenenholtz/status/1501905740813848582

Book

Scanned content from Social Dominance and Hierarchy in the Forest

SBIRs

9:15 Standup – done
Demo slides – done
Contract kickoff
Data science tagup delayed
Possible meeting in Moorestown next week?

GPT Agents

Make a base App class that has file loading, terminate, and implement_me callbacks – done
Create the Google app – lacked the will to do this
Modify the Wiki, GPT, and Twitter apps to use the base class – done

Phil 3.9.2022

Book

More on the deep bias chapter
Realized that Hofstede’s cultural dimensions are evenly split between nomad/stampede and dominance/parity

SBIRs

IRAD Monthly meeting
Meeting with Steve
Created the RCSNN Github repo

GPT Agents

Going to make a set of small apps that we can more directly compare GPT, Wikipedia, and Google search. Got a basic Google Custom Search Engine running. Here’s the output for “slang for COVID-19”

Gen Z Slang for the Coronavirus Pandemic: Miss Rona, Coronacation:
	link = www.businessinsider.com
	snippet = Apr 8, 2020 ... Miss Rona / The Rona — An abbreviation for the coronavirus. Some have called it "Miss Rona," adding the "Miss" to denote personality and "sass" ...

Decoding coronavirus slang, from quarantinis to magpies, covidiots ...:
	link = news.google.com
	snippet = Jun 13, 2020 ... Coronavirus slang · Magpie — to snatch up desirable staples in the supermarket, like toilet paper or pasta. · Covidiot — An insult for someone who ...

New Words We Created Because Of Coronavirus - Dictionary.com:
	link = www.dictionary.com
	snippet = Sep 15, 2020 ... covidiot. A blend of COVID-19 and idiot, covidiot is a slang insult for someone who disregards healthy and safety guidelines about the novel ...

Covid-19 Phrases and Slang That Are Now Commonplace ...:
	link = blog.cheapism.com
	snippet = Jan 7, 2022 ... Another new entry in the Merriam-Webster dictionary, this term refers to those with COVID-19 who are highly contagious and capable of ...

Phil 3.8.2022

Kubric: A scalable dataset generator

Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential to address these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent or mitigate problems regarding bias, privacy and licensing. Unfortunately, software tools for effective data generation are less mature than those for architecture design and training, which leads to fragmented generation efforts. To address these problems we introduce Kubric, an open-source Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines, and generating TBs of data. We demonstrate the effectiveness of Kubric by presenting a series of 13 different generated datasets for tasks ranging from studying 3D NeRF models to optical flow estimation. We release Kubric, the used assets, all of the generation code, as well as the rendered datasets for reuse and modification.

Tasks

Please call 1-888-692-4560 to arrange an appointment
Lights – done!
Outlaw – pinged for today
Physical
Lawn – done

SBIRs

12:00 SBIR kickoff review
1:30 Standup

GPT Agents

3:30 UMBC meeting

Phil 3.7.2022

GPT Agents

Fix select on the Wiki App – done
Add SharedObjects to load from file or environment variable – done
Add documentation

SBIRs

Prep slides for IRAD – done
Prep slides for MDA – started
Meeting with Ron – done

Book

More deep bias

3.5.2022

SBIRs

Wound up helping out Dave on the technical section for about 2 hours and Val for about an hour
Went to see Aaron, who is doing better. Going to keep him out of meetings for at least a few weeks. Also, it seems that he already did the slides for Thursday?
Maybe get T moved over?

Phil 3.4.2022

Book

Working on cruelty

SBIRs

10:00 Meeting with James – done
Write a bunch of stories – done

GPT Agents

Finish Wiki tool? Getting there! Done!

Phil 3.3.2022

Book

Spent a good deal of time researching when “cruelty is the point” started. Google’s daterange search was not much help, but Twitter has good search features in advanced search. Going to integrate them into the tool

GPT Agents

Wikipedia tool
Add launching of Twitter pages with search terms and date ranges

SBIRs

9:30 Standup
10:00 Meeting with Orest
10:30 Meeting with Rukan
11:30 Architecture meeting
1:00 Phase II intro meeting
2:00 CSC followup
3:00 Meeting with Carmine

Phil 3.2.2022

Not sure if this is true, but it wouldn’t surprise me

Tasks

Ping Outlaw – done
Physical
Lawn

GPT Agents

Intellij can build a requirements.txt file (www.jetbrains.com/help/pycharm/managing-dependencies.html#populate_dependency_files)
App directory
TweetCountExplorer – done
WikiPageviewExplorer – started
Failover to opening json token file if env variables aren’t found
Got the Twitter app running!

SBIRs

IP Doc – done
RCSNN Github
Timesheet crap – done
Chat with Ron about his student’s project
Gotta read ANOTHER SBIR by 10:00 tomorrow

Book

Worked on deep bias for causing harm

Phil 3.1.2022

We Are the 25%: Looking at Street Area Percentages and Surface Parking

Tasks

Ping Outlaw – done
Yard
AC – done
Physical

GPT Agents

Set up cookie cutter GitHub project for KeywordExplorer
3:30 Meeting

SBIRs

9:30 Sprint Planning
1:30 data rights
Send Orest info on Aaron

viztales

Dimension reduction, State, Orientation, and Speed

Monthly Archives: March 2022

Phil 3.12.2022

Phil 3.10.2022

Phil 3.9.2022

Phil 3.8.2022

Phil 3.7.2022

3.5.2022

Phil 3.4.2022

Phil 3.3.2022

Phil 3.2.2022

Phil 3.1.2022