AI-3: There is no road from here to there

christie · June 18, 2024, 1:20am

Haha. This is the perfect antidote after a week playing around with various LLMs and embedding models, various RAG and summarisation techniques, LllamaIndex and LangChain. I was starting to think AI is my personal Mara!

Just imagine what my frustration would be if I actually had to do this for a living, instead of just playing around! At least I am now under no illusion about the current capabilities of AI! There may be light at the end of the tunnel, but we shall see!

Snowbird · June 18, 2024, 4:06am

The service aims to provide faster care and reduce waiting times and pressure on clinicians, those behind it say.

Waiting lists for treatment for musculoskeletal (MSK) problems such as back, neck, and knee pain have grown by 27% since January last year.

Or you could just, I don’t know, not under fund public healthcare?

christie · June 18, 2024, 4:23am

If only the NHS minister used an LLM to solve the problem!

Here is my prompt to Llama 3 70B:

I am the health minister responsible for funding the National Health Service. Waiting lists for treatment for musculoskeletal (MSK) problems such as back, neck, and knee pain have grown by 27% since January last year. What should I do?

The answer:

A pressing concern! As your trusted advisor, I’ll provide you with a comprehensive approach to address the growing waiting lists for MSK treatment.

Short-term measures (0-6 months)

Increase funding for existing services: Allocate additional resources to current MSK services to boost capacity and reduce waiting times.
Recruit temporary staff: Hire locum consultants, physiotherapists, and other healthcare professionals on a short-term basis to help clear the backlog.
Optimize clinic schedules: Review and optimize clinic schedules to maximize throughput and minimize idle time.

Medium-term measures (6-18 months)

Invest in community-based services: Expand community-based MSK services, such as physiotherapy and osteopathy, to reduce referrals to hospitals and alleviate pressure on secondary care.
Develop virtual clinics: Establish virtual clinics for MSK consultations, enabling remote assessments and reducing the need for hospital visits.
Introduce triage systems: Implement effective triage systems to prioritize patients based on clinical urgency and direct them to the most appropriate treatment pathway.

Long-term measures (18 months+)

Invest in preventative care: Allocate funding to promote preventative measures, such as exercise programs, education, and workplace wellness initiatives, to reduce the incidence of MSK problems.
Develop specialized MSK centers: Establish specialized MSK centers with multidisciplinary teams, including orthopedic surgeons, physiotherapists, and pain management specialists, to provide comprehensive care.
Implement innovative technologies: Invest in innovative technologies, such as AI-powered diagnostic tools and robotic-assisted surgery, to improve treatment outcomes and efficiency.

Additional recommendations

Collect and analyze data: Establish a robust data collection system to monitor waiting lists, treatment outcomes, and patient satisfaction. Analyze this data to identify trends, areas for improvement, and opportunities for service redesign.
Engage with stakeholders: Collaborate with patients, healthcare professionals, and organizations to gain insights into the challenges faced by MSK services and develop solutions that meet their needs.
Develop a workforce strategy: Create a strategic plan to attract, retain, and develop a skilled workforce in MSK specialties, ensuring that the NHS has the necessary talent to deliver high-quality care.

By implementing these measures, you can effectively address the growing waiting lists for MSK treatment, improve patient outcomes, and strengthen the National Health Service.

sujato · June 18, 2024, 5:15am

Wow. Just this morning on my way back from pindapata, I dropped by our community center to leave some extra food. There’s an intern there, a social worker student, and we got talking about AI. She can’t stand it, she thinks it’s just ruined education, people using it to cheat all the time, which then makes it unfair for the honest students. But she said that as a social worker, at least her job was safe.

But, well, here we are.

OMG!

Snowbird · June 18, 2024, 6:25am

There is really nothing amazing about this AI barf. It’s just chewed up policy recommendations that were written by real humans. And of course if you don’t have enough doctors, you need to hire more. It’s not rocket science. Our problem is not lack of AI, it’s people convincing themselves that AI is the solution.

sujato · June 18, 2024, 6:35am

Fair enough, it’s more that the actual solutions are so obvious that even AI barf is better than what they actually plan to do.

christie · June 18, 2024, 6:54am

The issue is: not only are people irrationally optimistic about AI and easily impressed by what it produces, they are not even reading what it produces to realise it is making mistakes. They are just impressed by the apparent lucidity and coherence of the response.

I had this experience first hand attending a prompt engineering workshop. Everyone was enthusing about how varying the prompt produced better results - almost no one was looking at the actual output and realising the output isn’t that great, or accurate.

Years of social media doom scrolling have now taught us not to pay attention - it’s the vibe that matters, not the actual content. And this observation is from a person (me) that spent years being a “high level visionary, unconcerned with mere details” Now everyone is like me, except they haven’t learnt the lesson that ultimately details are important, or they will bite you.

richard.nagyfi · June 18, 2024, 7:32am

christie · June 20, 2024, 2:50am

So, I have been wondering. Why do I get such mixed results with my own experiments with LLM and RAG. Am I doing something wrong?

I’ve been following various academic papers on RAG techniques. I am able to reproduce their results using their examples, but when I substitute with own content the results are far more variable and … less satisfying. Am I incompetent? Why do the “pros” such as Perplexity (a startup AI search engine that unleashes RAG on the World Wide Web) seem to get better outcomes?

So it turns out, they don’t. Their engine also misquotes, misappropriates, provide missing or wrong citations, and make up stuff. Except they bullshit, and even deny they are scraping content using servers outside their published list, ignoring robot exclusion rules, and don’t direct traffic or attribute their sources.

christie · June 20, 2024, 11:39pm

This paper, which is refreshingly brutal and honest, summarises why I have been having problems getting good results from RAG.

Basically, in its current incarnation, RAG simply doesn’t work, period. Yes, I know there are good examples showing impressive results with RAG. However, these examples don’t necessarily point out that it requires tweaking. Parameters like context size, overlap and top_k may be the difference between a good result, and a horrible one. The prompts also need to be tweaked, sometimes on a per query basis. Even things like the choice of an embedding algorithm or vector database can sometimes make a difference.

To quote from the paper:

The problem here isn’t that large language models hallucinate, lie, or misrepresent the world in some way. It’s that they are not designed to represent the world at all; instead, they are designed to convey convincing lines of text. So when they are provided with a database of some sort, they use this, in one way or another, to make their responses more convincing. But they are not in any real way attempting to convey or transmit the information in the database. As Chirag Shah and Emily Bender put it: “Nothing in the design of language models (whose training task is to predict words given context) is actually designed to handle arithmetic, temporal reasoning, etc. To the extent that they sometimes get the right answer to such questions is only because they happened to synthesize relevant strings out of what was in their training data. No reasoning is involved […] Similarly, language models are prone to making stuff up […] because they are not designed to express some underlying set of information in natural language; they are only manipulating the form of language” (Shah & Bender, 2022). These models aren’t designed to transmit information, so we shouldn’t be too surprised when their assertions turn out to be false.

To summarise:

Investors, policymakers, and members of the general public make decisions on how to treat these machines and how to react to them based not on a deep technical understanding of how they work, but on the often metaphorical way in which their abilities and function are communicated. Calling their mistakes ‘hallucinations’ isn’t harmless: it lends itself to the confusion that the machines are in some way misperceiving but are nonetheless trying to convey something that they believe or have perceived. This, as we’ve argued, is the wrong metaphor. The machines are not trying to communicate something they believe or perceive. Their inaccuracy is not due to misperception or hallucination. As we have pointed out, they are not trying to convey information at all. They are bullshitting.

As I’ve discovered, LLMs have huge coherence issues with long context length. It’s no secret many of the techniques for RAG and summarisation rely on breaking down the context into smaller chunks - chain of density, MapReduce etc. By the time an LLM has finished summarising a paragraph, it may forget about it a few paragraphs later. How can we hope that an LLM will translate a DN sutta maintaining context and coherency, even if we apply chunking techniques? Some say this is only a problem with current generation LLMs, new models are coming out with 32K-128K context lengths. However, my limited testing shows this is simply a marketing number - models with large context sizes don’t seem to be significantly better at maintaining coherence and context.

Given the above, I think there are real issues with using LLMs on anything related to Buddhism. These models have no regard for the truth, they do not care about realisation, or extinguishment, or the cessation of suffering. They have a huge potential for creating more suffering, and delaying the achievement of the soteriological goal.

BethL · June 21, 2024, 12:30pm

They also have no regard for the underbelly, conveniently out of view: massive data center requirements to support it all and the immense human toll to get the rare earth elements and label the dumb data for Tesla.

christie · June 21, 2024, 8:56pm

Those coal plants will keep on burning - Big Tech is quietly walking back their climate change commitments in the AI race: