Monthly Archives: October 2024

Phil 10.9.2024

Bad day for hacked things:

Hacked ‘AI Girlfriend’ Data Shows Prompts Describing Child Sexual Abuse

  • “It’s basically a handful of open-source projects duct-taped together. I started poking around and found some vulnerabilities relatively quickly. At the start it was mostly just curiosity but I decided to contact you once I saw what was in the database.”

Jim Donnies! Done

SBIRs

  • 11:00 MP+SimAccel proposal meeting – went well, I think. Very different approaches. We’re new, bespoke, and they are legacy
  • 1:30 LM MP tagup
  • Work on book – working!

GPT Agents

  • Ping everyone to say I’ve finished my pass through the paper
  • add \t to bib script

Phil 10.8.2024

SBIRs

  • 9:00 Standup – done!
  • LM White Paper? This can wait, actually. Ron will need to do the MLOps, and can’t even get started until returns from his BD tasks and trials

GPT Agents

  • Continue with paper – done!
  • Ping Greg to discuss his comments

Phil 10.7.2024

Helene response hampered by misinformation, conspiracy theories

  • Officials have sought to tamp down the misinformation that has continued to spread online. The Federal Emergency Management Agency has been updating a webpage seeking to dispute common rumors, while the North Carolina Department of Public Safety has done the same, writing that authorities were “working around-the-clock to save lives and provide humanitarian relief.”

AI-Generated Pro-North Korean TikToks Are Also Bizarre Ads for Supplements

  • The ads also use an interesting blend of AI-generated cover images and real images within the slideshow itself. And not all of the ads are about North Korea. Some of them use AI-generated images of Taylor Swift and Jennifer Aniston to shill the supplements, while other slideshows are spreading disinformation about Mpox, are about TikTok trends like “demure,” or claim the supplements are “better than Ozempic.” 

Sample ballot of Baltimore County

Grants

  • Finish review 14 – Done and submitted!

SBIRs

  • Work on LM white paper with Aaron?
  • 3:00 Demo kickoff meeting – mostly figuring out what resources (compute, screens, etc) will be needed

GPT Agents

  • Work on paper. Wrote the script to convert footnotes to citations. It works well! Had a few issues getting raw strings to behave:
from tkinter import filedialog
import re
from typing import List, Dict

def load_file_to_list(filename:str) -> List:
    print("opening {}".format(filename))
    try:
        with open(filename, 'r') as file:
            lines = file.readlines()
            return [line.strip() for line in lines]
    except FileNotFoundError:
        print("Error: File '{}' not found".format(filename))
        return []

def save_list_to_file(l:List, filename:str):
    print("opening {}".format(filename))
    s:str
    try:
        with open(filename, 'w') as file:
            for s in l:
                file.write("{}\n".format(s))
    except FileNotFoundError:
        print("Error: File '{}' not found".format(filename))
        return []

filename = filedialog.askopenfilename(filetypes=(("tex files", "*.tex"),), title="Load tex File")
if filename:
    filename2 = filename.replace(".tex", "_mod.tex")
    filename3 = filename.replace(".tex", "_mod.bib")
    # open the pdf file
    l:List = load_file_to_list(filename)
    p1 = re.compile(r"\\footnote{\\url{(.*?)}}")
    p2 = re.compile(r"https://([^/]*)")

    s1:str
    s2:str
    s3:str
    l2 = []
    cite_dict = {}
    count = 1
    for s1 in l: # Get each line in the file
        #print(s1)
        m1 = p1.findall(s1) # find all the footnote urls
        for s2 in m1:
            #print("\t{}".format(s2))
            m2 = p2.match(s2) # pull out what we'll use for our cite
            s3 = m2.group(1).strip('www.')
            s3 = "{}_{}".format(s3, count)
            #print("\t\t{}".format(s3))
            olds = r"\footnote{\url{"+s2+"}}"
            news = r"\cite{"+s3+"}"
            #print("olds = {}[{}], news = {}".format(olds, s1.find(olds), news))
            s1 = s1.replace(olds, news)
            cite_dict[s3] = s2
        l2.append(s1)
        print(s1)
    save_list_to_file(l2, filename2) # write the modified text to a new file

    l2 = []
    for key, val in cite_dict.items():
        s = "@misc{"+key+",\n"
        s += '\tauthor = "{Last, First}",\n'
        s += '\tyear = "2024",\n'
        s += '\thowpublished = "\\url{'+val+'}",\n'
        s += 'note = "[Online; accessed 07-October-2024]"\n}\n'
        print(s)
        l2.append(s)
    save_list_to_file(l2, filename3) # write the citation text to a .bib file



Phil 10.5.2024

U.S. Wiretap Systems Targeted in China-Linked Hack

  • A cyberattack tied to the Chinese government penetrated the networks of a swath of U.S. broadband providers, potentially accessing information from systems the federal government uses for court-authorized network wiretapping requests.
  • Here are my AI-weapons thoughts on this: 1) If you can plant a MitM LLM that works to make people want to legislate back doors for cybercrime, you could set up this kind of operation. 2) If these backdoors already exist, you can plant LLMs and cause further havoc, or adjust the behavior of your adversary in more subtle ways.

Phil 10.4.2024

People Are Sharing Fake Hurricane Helene Photos for Profit and Political Gain

  • “[Community Note] says this is AI. In this case, I don’t care. We should look out for our own (Americans before the rest of the world and I wouldn’t be at all surprised if there was a girl fitting the description that wasn’t lucky enough to make it to a photographer for such an image,” wrote yet another.

Where Facebook’s AI Slop Comes From

  • Facebook itself is paying creators in India, Vietnam, and the Philippines for bizarre AI spam that they are learning to make from YouTube influencers and guides sold on Telegram.

Phil10.3.2024

An Audacious Plan to Halt the Internet’s Enshittification and Throw It Into Reverse

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

  • Large model inference is shifting from cloud to edge due to concerns about the privacy of user interaction data. However, edge devices often struggle with limited computing power, memory, and bandwidth, requiring collaboration across multiple devices to run and speed up LLM inference. Pipeline parallelism, the mainstream solution, is inefficient for single-user scenarios, while tensor parallelism struggles with frequent communications. In this paper, we argue that tensor parallelism can be more effective than pipeline on low-resource devices, and present a compute- and memory-efficient tensor parallel inference system, named TPI-LLM, to serve 70B-scale models. TPI-LLM keeps sensitive raw data local in the users’ devices and introduces a sliding window memory scheduler to dynamically manage layer weights during inference, with disk I/O latency overlapped with the computation and communication. This allows larger models to run smoothly on memory-limited devices. We analyze the communication bottleneck and find that link latency, not bandwidth, emerges as the main issue, so a star-based allreduce algorithm is implemented. Through extensive experiments on both emulated and real testbeds, TPI-LLM demonstrated over 80% less time-to-first-token and token latency compared to Accelerate, and over 90% compared to Transformers and Galaxy, while cutting the peak memory footprint of Llama 2-70B by 90%, requiring only 3.1 GB of memory for 70B-scale models.

SBIRs

  • Look for new BAAs – done
  • 9:00 Standup – done
  • 12:45 USNA – trainwreck, hoping to fix
  • Expense travel – done

GPT Agents

  • Working on conclusions. I think I have a nice hook from the Gutenberg Parentheses
  • 9:30 meeting with Matt. Show the GPM demo and talk about coordination with expensive information, stiff & dense networks vs. slack and sparse, the need for embodiment to find truly novel things, the curse of dimensionality, explore/exploit and the need for diversity. done!
  • Respond to IUI – done

Grants

  • Start review 14 – started

Phil 10.2.2024

Horny Robot Baby Voice: James Vincent on AI chatbots

  • Of all such apps I have tried, the most ambitious is Replika. Unlike most of its competitors, it has a chat interface with elements that bring it close to life-simulation games like The Sims. You’re invited to name and design your bot, with various options for hairstyle and skin color, along with sliders that adjust the size of breasts or muscles. You’re then booted into a sort of bot purgatory: a white-walled waiting room, sparsely furnished, where the avatar paces like a prisoner, waiting for you to strike up conversation. Users are encouraged to customize the room with furniture and acquire new outfits using in-app currency. This can be bought with real money or earned by completing ‘quests’ such as talking to your bot every day or sharing photos. It’s a feedback loop that encourages constant engagement and self-disclosure, rewarding users with the power of customization, so that the bot feels made for you and you alone.

SBIRs

  • Honestly, everything here is support other than BD Opportunity research. I do think I’ll dig into that after lunch since there should be new BAAs now?

GPT Agents

  • Working on challenges section – first draft done!
  • 3:00 Alden meeting

Phil 10.1.2024

This looks very good, if a bit dated. Deepset/Haystack appear to have continued development. So check out the website first. Build a Search Engine with GPT-3

  • Semantic search engines — our specialty here at deepset — are often powered by extractive question answering models. These models return snippets from the knowledge base verbatim, rather than generating text from scratch the way ChatGPT does. However, many applications can benefit from the abilities of generative LLMs. That’s why Haystack, deepset’s open-source framework for applied natural language processing (NLP), allows you to leverage multiple GPT models in your pipeline. With this approach, you can build a GPT-powered semantic search engine that uses your own data as ground truth and bases its natural-language answers on the information it contains.

SBIRs

  • Maybe set up the trade show demo project? Nope, but soon, probably

Grants

  • Submit review of proposal 10. Done. And 12!

GPT Agents

  • Work on challenges section. Did review 12 instead. I’ll work on this tomorrow morning