Phil 3.5.2024

Started the day off on the wrong foot by dropping my breakfast. Grumble

SBIRs

  • Starting LM white paper
  • NIST AI COE presentation. Slides are done! Need to copy over to ppt and copy.

GPT Agents

  • Need to make a 10-minute version of the presentation
  • Need to see how the upstairs TV could work as a monitor
  • Need to put together a new poster
  • And I really need to add a AI White Hat section to the KillerApps paper based on the reception of the idea today
  • The paper is up on ArXiV: RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots
    • Large language models (LLMs) like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate — generate plausible but false information — poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT’s use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation (RAG) can counter hallucinations by integrating external knowledge with prompts. We empirically evaluate RAG against standard LLMs using prompts designed to induce hallucinations. Our results show that RAG increases accuracy in some cases, but can still be misled when prompts directly contradict the model’s pre-trained understanding. These findings highlight the complex nature of hallucinations and the need for more robust solutions to ensure LLM reliability in real-world applications. We offer practical recommendations for RAG deployment and discuss implications for the development of more trustworthy LLMs.