Category Archives: Torch

Phil 7.3.20

Today is a federal holiday, so no rocket science

Huggingface has a pipeline interface now that is pretty abstract. This works:

from transformers import pipeline

translator = pipeline("translation_en_to_fr")
print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))
  • [{‘translation_text’: ‘Hugging Face est une entreprise technologique basée à New York et à Paris.’}]

Wow: GPT-3 writes code!

DtZ is back up! Too many countries have the disease and the histories had to be cropped to stay under the data cap for the free service

GPT-2 Agents

  • Work on more granular path finding
    • Going to try the hypotenuse of distance to source and line first – nope
    • Trying looking for the distances of each and doing a nested sort
    • I had a problem where I was checking to see whether a point was between the current node and the target node using the original line between the source and target nodes. Except that I was checking on a lone from the current node to the target, and failing the test. Oops! Fixed
    • I went back to the hypotenuse version now that the in_between test isn’t broken and look at that!


    • Added the option for coarse or granular paths
  • Start thinking about topic extraction for a given corpus


  • Evaluate Arabic to English translation. Got it working!
    from transformers import MarianTokenizer, MarianMTModel
    from typing import List
    src = 'ar'  # source language
    trg = 'en'  # target language
    sample_text = "لم يسافر أبي إلى الخارج من قبل"
    sample_text2 = "الصحة_السعودية تعلن إصابة أربعيني بفيروس كورونا بالمدينة المنورة حيث صنفت عدواه بحالة أولية مخالطة الإبل مشيرة إلى أن حماية الفرد من(كورونا)تكون باتباع الإرشادات الوقائية والمحافظة على النظافة والتعامل مع #الإبل والمواشي بحرص شديد من خلال ارتداء الكمامة "
    mname = f'Helsinki-NLP/opus-mt-{src}-{trg}'
    model = MarianMTModel.from_pretrained(mname)
    tok = MarianTokenizer.from_pretrained(mname)
    batch = tok.prepare_translation_batch(src_texts=[sample_text2])  # don't need tgt_text for inference
    gen = model.generate(**batch)  # for forward pass: model(**batch)
    words: List[str] = tok.batch_decode(gen, skip_special_tokens=True) 
  • It took a few tries to find the right model. The naming here is very haphazard.
  • Asked for a sanity check from the group
    • This:
      الصحة_السعودية تعلن إصابة أربعيني بفيروس كورونا بالمدينة المنورة حيث صنفت عدواه بحالة أولية مخالطة الإبل مشيرة إلى أن حماية الفرد من(كورونا)تكون باتباع الإرشادات الوقائية والمحافظة على النظافة والتعامل مع #الإبل والمواشي بحرص شديد من خلال ارتداء الكمامة
    • Translates to this:
      Saudi health announces a 40-year-old corona virus in the city of Manora, where his enemy was classified as a primary camel conglomerate, indicating that the protection of the individual from Corona would be through preventive guidance, hygiene, and careful handling of the Apple and the cattle by wearing the gag.


  • Write script that takes a batch of rows and adds translations until all the rows in the table are complete

Book chat

Phil 1.2.20

7:00 – 4:30 ASRC PhD

  • More highlighting and slides. Once I get through the Background section, I’ll write the overview, then repeat that patterns.
    • I’m tweaking too much text to keep the markup version. Sigh.
    • Finished Background and sent that to Wayne
  • GPT-2 Agents. See if we can get multiple texts generated – nope
    • Build a corpus of .txt files
    • Try running them through LMN
  • No NOAA meeting
  • No ORCA meeting

Phil 1.30.19

7:00 – 7:00 ASRC PhD


  • Nice visualization, with map-like aspects: The Climate Learning Tree
  •  Dissertation
    • Start JuryRoom section – done!
    • Finished all content!
  • GPT-2 Agents
    • Download big model and try to run it
    • Move models and code out of the transformers project
  • GOES
    • Learning by Cheating (sounds like a mechanism for simulation to work with)
      • Vision-based urban driving is hard. The autonomous system needs to learn to perceive the world and act in it. We show that this challenging learning problem can be simplified by decomposing it into two stages. We first train an agent that has access to privileged information. This privileged agent cheats by observing the ground-truth layout of the environment and the positions of all traffic participants. In the second stage, the privileged agent acts as a teacher that trains a purely vision-based sensorimotor agent. The resulting sensorimotor agent does not have access to any privileged information and does not cheat. This two-stage training procedure is counter-intuitive at first, but has a number of important advantages that we analyze and empirically demonstrate. We use the presented approach to train a vision-based autonomous driving system that substantially outperforms the state of the art on the CARLA benchmark and the recent NoCrash benchmark. Our approach achieves, for the first time, 100% success rate on all tasks in the original CARLA benchmark, sets a new record on the NoCrash benchmark, and reduces the frequency of infractions by an order of magnitude compared to the prior state of the art. For the video that summarizes this work, see this https URL
  • Meeting with Aaron
    • Overview at the beginning of each chapter – look at Aaron’s chapter 5 for
    • example intro and summary.
    • Callouts in text should match the label
    • hfill to right-justify
    • Footnote goes after puntuation
    • Punctuation goes inside quotes
    • for url monospace use \texttt{} (
    • indent blockquotes 1/2 more tab
    • Non breaking spaces on names
    • Increase figure sizes in intro