Phil 10.4.2021

Wheel!

Book

GPT-Agents

  • Start LIWC csv reader – got the reader and counts done. Need to see if I can do the rese in Excel
  • Ping Andreea – done!

SBIRs

  • Expense report – done again, fingers crossed!
  • current parent/child node logic
  • Got most of the logic together for setting parent group and selected node. I’m not really happy about this, there are too many states and hidden relationships. This will need a cleanup once it is working. I may be able to do a good deal with looking at the node info though

Phil 10.2.2021

Rubrix is a production-ready Python framework for exploring, annotating, and managing data in NLP projects.

Key features:

  • Open: Rubrix is free, open-source, and 100% compatible with major NLP libraries (Hugging Face transformers, spaCy, Stanford Stanza, Flair, etc.). In fact, you can use and combine your preferred libraries without implementing any specific interface.
  • End-to-end: Most annotation tools treat data collection as a one-off activity at the beginning of each project. In real-world projects, data collection is a key activity of the iterative process of ML model development. Once a model goes into production, you want to monitor and analyze its predictions, and collect more data to improve your model over time. Rubrix is designed to close this gap, enabling you to iterate as much as you need.
  • User and Developer Experience: The key to sustainable NLP solutions is to make it easier for everyone to contribute to projects. Domain experts should feel comfortable interpreting and annotating data. Data scientists should feel free to experiment and iterate. Engineers should feel in control of data pipelines. Rubrix optimizes the experience for these core users to make your teams more productive.
  • Beyond hand-labeling: Classical hand labeling workflows are costly and inefficient, but having humans-in-the-loop is essential. Easily combine hand-labeling with active learning, bulk-labeling, zero-shot models, and weak-supervision in novel data annotation workflows.

Phil 10.1.2021

I’m really not ready for October

JuryRoom

  • Aaron, Panos and I had a walkthrough of Jarod’s SLR. It needs a lot of organizing and framing. The information is all there, but it’s more like a collection of notes than a paper.

SBIRs

  • LAIC meeting went well, but it got a little shaky. The upshot is that we will probably demo the navigator app first, and the builder app second. The use case can still be rules of engagement, where we can show a script unfolding where the rules are followed (staying close) in response to an event, and the rules being disregarded. Then we can show how the map is made.
  • Need to start wiring up the text to the graph
  • Working out adding seeds or topics to groups and getting details

Phil 9.30.2021

Lost my UMBC privileges until the approval goes back through. Darn!

SBIRs

  • 9:15 Standup
  • 11:00 LAIC meeting
  • Working on getting the topics and groups to work together -done
  • Need to tie in the map data objects and then get GPT, then DB access running

Phil 9.29.2021

rliable is an open-source Python library for reliable evaluation, even with a handful of runs, on reinforcement learning and machine learnings benchmarks.

Deep Reinforcement Learning at the Edge of the Statistical Precipice

  • Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field’s confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field.

GPT Agents

  • Put together spreadsheets for French, Chinese, Mexican and American LIWC results
  • 4:15 Meeting

SBIRs

  • Promoting generic params objects to their own classes with default values
  • Working out seeds and visited topics – done!
  • Getting the adding and using of topic groups. This means adding a method that supports callbacks to external (parent) code, so that I can keep the two components coordinated
  • 10:30 BAA meeting
  • 11:30 Web framework meeting

JuryRoom

  • 7:00 Meeting

Phil 9.28.2021

Tr

GPT Agents

  • Working on getting the LWIC data organized – done!

SBIRs

  • Sprint Planning
  • Finished(?) with the parameter objects. I may change the file to MapsDataObjects and inherit from the serializing base class once I see how everything works with this approach.
  • Created PickleDataObjectBase and refactord. Loading and saving seems to be working fine
  • Ok, back to GraphBuilder. Added save, load, and exit callback attributes that can be passed a function from the main application. That should let me get the data out of the ProjectWindow class
  • Callbacks are working!

Phil 9.27.2021

ryanjgallagher/focalevents

GPT Agents

  • See what LWIC data I have and then get the rest analyzed

SBIRs

  • Sprint demos
  • Working on being able to pass complex data objects nicely. Reworked everything to use pickle, which serializes objects nicely, and uses binary strings, which should be more compact, too.
  • Meeting with John and Aaron about webapps

Phil 9.25.2021

Today’s ride

Neutral bots probe political bias on social media

  • Social media platforms attempting to curb abuse and misinformation have been accused of political bias. We deploy neutral social bots who start following different news sources on Twitter, and track them to probe distinct biases emerging from platform mechanisms versus user interactions. We find no strong or consistent evidence of political bias in the news feed. Despite this, the news and information to which U.S. Twitter users are exposed depend strongly on the political leaning of their early connections. The interactions of conservative accounts are skewed toward the right, whereas liberal accounts are exposed to moderate content shifting their experience toward the political center. Partisan accounts, especially conservative ones, tend to receive more followers and follow more automated accounts. Conservative accounts also find themselves in denser communities and are exposed to more low-credibility content.

We are very pleased to announce the release of scikit-learn 1.0! The library has been stable for quite some time, releasing version 1.0 is recognizing that and signalling it to our users. This release does not include any breaking changes apart from the usual two-release deprecation cycle. For the future, we do our best to keep this pattern.

Phil 9.24.2021

  • Can a poem be alive? Can we drink poetry?In this event, we explore these ideas through a bioart project. Raaz is a multimedia installation with a poetry-infused bottle of wine surrounded by audio-visual representations of a 14th-century Persian poem on transformation. The transgenic yeast used to make the wine included an encoding of the poem. The installation creates a meditative space surrounded by microscopic images of the yeast and ambient audio that combines a reading of the poem, its Morse code, and an original bass flute melody.During the event you can experience Raaz in-person, hear about the artists’ process and motivations in their opening talk, and participate in a hands-on agar art activity for individuals of all ages!

It’s only just Fall, but the rains came through yesterday and the weather has shifted. The mornings are chilly and the air is drier.

I’m also not getting any traction on the book, so I have to figure out what to do next. Maybe a more academic press? Do you need an agent for that? Anyway, I’m procrastinating on doing any research-y things today and am just going to write some nice therapeutic code.

SBIRs

  • Figured out the problem I was having with scaled and scrolled canvas. The trick is to use the tk.Canvas.canvasx(event.x) and tk.Canvas.canvasy(event.y) calls, which map the window mouse X,Y points to the (larger) canvas coordinates:
Still having a bit of trouble clicking on some nodes though…
  • The other thing that I realize is that I think I want to send data around in JSON files so that the transition to a webapp is easier . Refactoring to support this.
  • Loading and saving out JSON project files:

Phil 9.23.2021

Schedule winterizing!

Summarizing Books with Human Feedback

What’s Working and What Isn’t in Researching Influence Operations?

  • The emergence of a field devoted to researching, and countering, influence operations is something I have watched closely. In 2014, I channeled a fascination with propaganda from the two world wars into researching how the phenomena was changing in a digital age. In those early days, there were few places to find work researching influence operations. The career paths were mostly in academia or in the military or intelligence services. Marrying the two, I chose to pursue a doctorate in war studies. Along the way, I have worked with tech companies, militaries, civil society groups, and governments, learning how each understands and works to counter (and sometimes run) influence operations.
“Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?” http://openreview.net/pdf?id=Wrtp36cbl61

GPT Agents

  • Meeting yesterday. One of the things that came up was that the GPT struggled to train against the French corpora more than the others as measured by out-of-band responses. Need to see what’s going on here. Also, does this show up in LWIC?

SBIRs

  • 9:15 standup – done
  • 11:00 LAIC weekly – went well
  • 12:00 Performance Engineering presentation – missed it
  • 2:00 Adversarial learning weekly – Rukan is working on other things right now, so it was more of a chat. We spent a good deal of time talking about the Pandemonium Cognitive model.
  • 3:30 Monthly Data Science tagup. Cool presentation on LANDSAT
  • 4:00 Phone interview – That was fun!
  • GUI
    • Trying to figure out why the scrollbars mess up the node picking
    • Adding project setup page

Phil 9.22.2021

Wrote a quick paragraph for Stacey

Order masks!

GPT Agents

  • Got the data from the ground truth and runs. Need to combine, then do a rollup of American, French, Chinese, and Mexican
  • 4:14 Meeting

SBIRs

  • More GUI. Need to be able to drag selected nodes around. Got it working well enough for the first pass. I need to fix it so the link back to the node is also moved:
Select!
  • There was a lot of stuff to work through with events and binds, which this tutorial really helped with: python-course.eu/tkinter_events_binds
  • Start pulling data in from OpenAI, using a cheaper model
  • Put together a putative project file
  • Send an email to Aaron and Orest about OpenAI expected charges

JuryRoom

  • 7:00 Meeting

Phil 9.21.2021

Is it the first day of Fall? Nope, tomorrow

Two passes by agents for the book. Sigh. Should I start thinking about turning chapters into articles?

Is The Far-Right’s Praise Of The Taliban Indicative Of Something Bigger?

  • Radical ideologies can and do inevitably interconnect historically; the far-right’s fascination with radical forms of Islam is not a new phenomenon and while it may have been accentuated by militant Islamists’ fame, we cannot overlook these movements deep historic roots and how historical fragments, real or imagined, intersect to inspire elements of the far-right today.

SBIR(s)

  • Helped Aaron yesterday with getting formatting right on the report
  • Stand up today. Mention funding. Come to think of it, add an approximate token price calculator for a session. That info may be in the JSON file that comes back from OpenAI
  • Add a Run/Stop button to the graph display (done!). Will still need to be able to drag around nodes
  • Tkinter canvas zoom + move/pan
Working Scrollbars!
Working zoom!
  • Zoom turned out to be a bit tricky, because the lines were not being scaled and translated along with the nodes. TO do this, I needed to use the screen coordinates for the nodes, which I got like this:
def set_screen_coords(self):
coords = self.cd.canvas.coords(self.id)
self.cx = (coords[0] + coords[2])/2
self.cy = (coords[1] + coords[3])/2
  • Which I then combined with the global(!) positions of the nodes to get the combined effect:
self.x += self.dx * elapsed
self.y += self.dy * elapsed
self.cd.canvas.move(self.id, self.dx * elapsed, self.dy * elapsed) # global coords
self.set_screen_coords() # get local coords

for n, l in self.neighbor_dict.items():
    self.cd.canvas.coords(l, self.cx, self.cy, n.cx, n.cy) #use local coords
  • Got selection working:
  • Got labels working:

GPT Agents

  • The French, Chinese, and Mexican reviews are still cooking. It looks like about 8 hours more to finish generation, then the sentiment. So everything will probably be done tomorrow? Then I can run the analysis, and probably start writing.
  • Also, in the app, I want to try hooking the system to the local GPT models, and try to reproduce the chess board.

Phil 8.20.21

Fun AA baseball game yesterday. Baysox go to the playoffs!

JuryRoom

  • Reading Jarod’s SLR

GPT Agents

  • Creating French, Chinese, and Mexican reviews from the models. This will probably take a few days

SBIRs

  • Start adding functionality to UI
  • Add config file for db and OpenAI access. Add popups if accesses are missing
  • Working on getting all the pieces working for drawing a network graph. It’s coming along nicely. I can draw animated nodes and lines now:
  • I found this version of ForceAtlas2 on Github, so I think I can have a level of force-directed network drawing
  • Subclassing ForceNode from MoveableNode. And it’s working!
  • Need to stop calculation when dx/dy drop below a certain threshold, and click on node to:
    • Drag
    • Get info
    • Set source and target for trajectories

Phil 9.18.21

WeightWatcher (WW): is an open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data. It can be used to:

  • analyze pre/trained pyTorch, Keras, DNN models (Conv2D and Dense layers)
  • monitor models, and the model layers, to see if they are over-trained or over-parameterized
  • predict test accuracies across different models, with or without training data
  • detect potential problems when compressing or fine-tuning pretrained models
  • layer warning labels: over-trained; under-trained

GPT Agents

  • Finished extracting French, Chinese and Mexican reviews and ran the sentiment analyzer
  • Finished creating the French, Chinese, and Mexican models (50k reviews, 6 epochs). Need to run them next
  • I want to try WW (above) on the stars models and see what it says

Phil 9.17.2021

Short day today. Leave here NLT 10:45. Bring grocery bags!

Found the Pushshift Github today. Could be quite useful

Submitted the actual dissertation to the copyright office.

GPT Agents

  • Working on training models for French, Chinese, and Mexican restaurants on the corpora I built yesterday. Need to extract 10k items of ground truth while training
  • French Model
***** train metrics *****
  epoch                    =        6.0
  train_loss               =     2.9698
  train_runtime            = 2:32:07.50
  train_samples            =       5929
  train_samples_per_second =      3.897
  train_steps_per_second   =      3.897
09/16/2021 19:05:49 - INFO - __main__ - *** Evaluate ***
[INFO|trainer.py:2165] 2021-09-16 19:05:49,388 >> ***** Running Evaluation *****
[INFO|trainer.py:2167] 2021-09-16 19:05:49,388 >>   Num examples = 2281
[INFO|trainer.py:2170] 2021-09-16 19:05:49,388 >>   Batch size = 8
100%|████████████████████████████████████████████████████████████████████████████████| 286/286 [02:37<00:00,  1.82it/s]
***** eval metrics *****
  epoch                   =        6.0
  eval_loss               =     3.0133
  eval_runtime            = 0:02:38.06
  eval_samples            =       2281
  eval_samples_per_second =     14.431
  eval_steps_per_second   =      1.809
  perplexity              =    20.3535
  • Chinese Model
***** train metrics *****
  epoch                    =        6.0
  train_loss               =     2.9474
  train_runtime            = 2:29:11.94
  train_samples            =       5808
  train_samples_per_second =      3.893
  train_steps_per_second   =      3.893
09/17/2021 09:10:58 - INFO - __main__ - *** Evaluate ***
[INFO|trainer.py:2165] 2021-09-17 09:10:58,203 >> ***** Running Evaluation *****
[INFO|trainer.py:2167] 2021-09-17 09:10:58,203 >>   Num examples = 1090
[INFO|trainer.py:2170] 2021-09-17 09:10:58,203 >>   Batch size = 8
100%|████████████████████████████████████████████████████████████████████████████████| 137/137 [01:15<00:00,  1.81it/s]
***** eval metrics *****
  epoch                   =        6.0
  eval_loss               =     2.9766
  eval_runtime            = 0:01:16.25
  eval_samples            =       1090
  eval_samples_per_second =     14.295
  eval_steps_per_second   =      1.797
  perplexity              =    19.6214
  • Mexican Model
***** train metrics *****
  epoch                    =        6.0
  train_loss               =     2.9156
  train_runtime            = 1:48:46.94
  train_samples            =       4214
  train_samples_per_second =      3.874
  train_steps_per_second   =      3.874
09/17/2021 11:04:52 - INFO - __main__ - *** Evaluate ***
[INFO|trainer.py:2165] 2021-09-17 11:04:52,237 >> ***** Running Evaluation *****
[INFO|trainer.py:2167] 2021-09-17 11:04:52,237 >>   Num examples = 2175
[INFO|trainer.py:2170] 2021-09-17 11:04:52,237 >>   Batch size = 8
100%|████████████████████████████████████████████████████████████████████████████████| 272/272 [02:29<00:00,  1.82it/s]
***** eval metrics *****
  epoch                   =        6.0
  eval_loss               =     2.9759
  eval_runtime            = 0:02:30.52
  eval_samples            =       2175
  eval_samples_per_second =     14.449
  eval_steps_per_second   =      1.807
  perplexity              =    19.6069

SBIR(s)

  • Work on getting an animated canvas working. Done!