Phil 7.3.20

Today is a federal holiday, so no rocket science

Huggingface has a pipeline interface now that is pretty abstract. This works:

from transformers import pipeline

translator = pipeline("translation_en_to_fr")
print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))

[{‘translation_text’: ‘Hugging Face est une entreprise technologique basée à New York et à Paris.’}]

Wow: GPT-3 writes code!

DtZ is back up! Too many countries have the disease and the histories had to be cropped to stay under the data cap for the free service

GPT-2 Agents

Work on more granular path finding
- Going to try the hypotenuse of distance to source and line first – nope
- Trying looking for the distances of each and doing a nested sort
- I had a problem where I was checking to see whether a point was between the current node and the target node using the original line between the source and target nodes. Except that I was checking on a lone from the current node to the target, and failing the test. Oops! Fixed
- I went back to the hypotenuse version now that the in_between test isn’t broken and look at that!

granular

- Added the option for coarse or granular paths
Start thinking about topic extraction for a given corpus

#COVID

Evaluate Arabic to English translation. Got it working!

from transformers import MarianTokenizer, MarianMTModel
from typing import List
src = 'ar'  # source language
trg = 'en'  # target language
sample_text = "لم يسافر أبي إلى الخارج من قبل"
sample_text2 = "الصحة_السعودية تعلن إصابة أربعيني بفيروس كورونا بالمدينة المنورة حيث صنفت عدواه بحالة أولية مخالطة الإبل مشيرة إلى أن حماية الفرد من(كورونا)تكون باتباع الإرشادات الوقائية والمحافظة على النظافة والتعامل مع #الإبل والمواشي بحرص شديد من خلال ارتداء الكمامة "
mname = f'Helsinki-NLP/opus-mt-{src}-{trg}'

model = MarianMTModel.from_pretrained(mname)
tok = MarianTokenizer.from_pretrained(mname)
batch = tok.prepare_translation_batch(src_texts=[sample_text2])  # don't need tgt_text for inference
gen = model.generate(**batch)  # for forward pass: model(**batch)
words: List[str] = tok.batch_decode(gen, skip_special_tokens=True) 
print(words)

It took a few tries to find the right model. The naming here is very haphazard.

Asked for a sanity check from the group

This:

الصحة_السعودية تعلن إصابة أربعيني بفيروس كورونا بالمدينة المنورة حيث صنفت عدواه بحالة أولية مخالطة الإبل مشيرة إلى أن حماية الفرد من(كورونا)تكون باتباع الإرشادات الوقائية والمحافظة على النظافة والتعامل مع #الإبل والمواشي بحرص شديد من خلال ارتداء الكمامة

Translates to this:

Saudi health announces a 40-year-old corona virus in the city of Manora, where his enemy was classified as a primary camel conglomerate, indicating that the protection of the individual from Corona would be through preventive guidance, hygiene, and careful handling of the Apple and the cattle by wearing the gag.

Write script that takes a batch of rows and adds translations until all the rows in the table are complete

Book chat

List of folks who would be interesting to interview
- Stewart Russel
- Stuart Kauffman
- Alex (Sandy) Pentland
- Kate Starbird
- Joanna Bryson
- Daniel DeNicola
- Margaret Gilbert
- Joseph Liechty /Cecelia Clegg
- Rebecca Solint
- Zeynep Tefuchi
- Christian Jacob (directeur de recherche at the Centre national de la recherche scientifique in Paris)
- Ezra Klien

viztales

Dimension reduction, State, Orientation, and Speed

Phil 7.3.20

Share this:

Related