AI-3: There is no road from here to there

Haha. This is the perfect antidote after a week playing around with various LLMs and embedding models, various RAG and summarisation techniques, LllamaIndex and LangChain. I was starting to think AI is my personal Mara!

Just imagine what my frustration would be if I actually had to do this for a living, instead of just playing around! At least I am now under no illusion about the current capabilities of AI! There may be light at the end of the tunnel, but we shall see!

1 Like

The service aims to provide faster care and reduce waiting times and pressure on clinicians, those behind it say.

Waiting lists for treatment for ­musculoskeletal (MSK) problems such as back, neck, and knee pain have grown by 27% since January last year.

Or you could just, I don’t know, not under fund public healthcare?

1 Like

If only the NHS minister used an LLM to solve the problem!

Here is my prompt to Llama 3 70B:

I am the health minister responsible for funding the National Health Service. Waiting lists for treatment for ­musculoskeletal (MSK) problems such as back, neck, and knee pain have grown by 27% since January last year. What should I do?

The answer:

A pressing concern! As your trusted advisor, I’ll provide you with a comprehensive approach to address the growing waiting lists for MSK treatment.

Short-term measures (0-6 months)

  1. Increase funding for existing services: Allocate additional resources to current MSK services to boost capacity and reduce waiting times.
  2. Recruit temporary staff: Hire locum consultants, physiotherapists, and other healthcare professionals on a short-term basis to help clear the backlog.
  3. Optimize clinic schedules: Review and optimize clinic schedules to maximize throughput and minimize idle time.

Medium-term measures (6-18 months)

  1. Invest in community-based services: Expand community-based MSK services, such as physiotherapy and osteopathy, to reduce referrals to hospitals and alleviate pressure on secondary care.
  2. Develop virtual clinics: Establish virtual clinics for MSK consultations, enabling remote assessments and reducing the need for hospital visits.
  3. Introduce triage systems: Implement effective triage systems to prioritize patients based on clinical urgency and direct them to the most appropriate treatment pathway.

Long-term measures (18 months+)

  1. Invest in preventative care: Allocate funding to promote preventative measures, such as exercise programs, education, and workplace wellness initiatives, to reduce the incidence of MSK problems.
  2. Develop specialized MSK centers: Establish specialized MSK centers with multidisciplinary teams, including orthopedic surgeons, physiotherapists, and pain management specialists, to provide comprehensive care.
  3. Implement innovative technologies: Invest in innovative technologies, such as AI-powered diagnostic tools and robotic-assisted surgery, to improve treatment outcomes and efficiency.

Additional recommendations

  1. Collect and analyze data: Establish a robust data collection system to monitor waiting lists, treatment outcomes, and patient satisfaction. Analyze this data to identify trends, areas for improvement, and opportunities for service redesign.
  2. Engage with stakeholders: Collaborate with patients, healthcare professionals, and organizations to gain insights into the challenges faced by MSK services and develop solutions that meet their needs.
  3. Develop a workforce strategy: Create a strategic plan to attract, retain, and develop a skilled workforce in MSK specialties, ensuring that the NHS has the necessary talent to deliver high-quality care.

By implementing these measures, you can effectively address the growing waiting lists for MSK treatment, improve patient outcomes, and strengthen the National Health Service.

Wow. Just this morning on my way back from pindapata, I dropped by our community center to leave some extra food. There’s an intern there, a social worker student, and we got talking about AI. She can’t stand it, she thinks it’s just ruined education, people using it to cheat all the time, which then makes it unfair for the honest students. But she said that as a social worker, at least her job was safe.

But, well, here we are.

OMG!

There is really nothing amazing about this AI barf. It’s just chewed up policy recommendations that were written by real humans. And of course if you don’t have enough doctors, you need to hire more. It’s not rocket science. Our problem is not lack of AI, it’s people convincing themselves that AI is the solution.

1 Like

Fair enough, it’s more that the actual solutions are so obvious that even AI barf is better than what they actually plan to do.

The issue is: not only are people irrationally optimistic about AI and easily impressed by what it produces, they are not even reading what it produces to realise it is making mistakes. They are just impressed by the apparent lucidity and coherence of the response.

I had this experience first hand attending a prompt engineering workshop. Everyone was enthusing about how varying the prompt produced better results - almost no one was looking at the actual output and realising the output isn’t that great, or accurate.

Years of social media doom scrolling have now taught us not to pay attention - it’s the vibe that matters, not the actual content. And this observation is from a person (me) that spent years being a “high level visionary, unconcerned with mere details” Now everyone is like me, except they haven’t learnt the lesson that ultimately details are important, or they will bite you.

4 Likes

:face_holding_back_tears:

1 Like

So, I have been wondering. Why do I get such mixed results with my own experiments with LLM and RAG. Am I doing something wrong?

I’ve been following various academic papers on RAG techniques. I am able to reproduce their results using their examples, but when I substitute with own content the results are far more variable and … less satisfying. Am I incompetent? Why do the “pros” such as Perplexity (a startup AI search engine that unleashes RAG on the World Wide Web) seem to get better outcomes?

So it turns out, they don’t. Their engine also misquotes, misappropriates, provide missing or wrong citations, and make up stuff. Except they bullshit, and even deny they are scraping content using servers outside their published list, ignoring robot exclusion rules, and don’t direct traffic or attribute their sources.

3 Likes

This paper, which is refreshingly brutal and honest, summarises why I have been having problems getting good results from RAG.

Basically, in its current incarnation, RAG simply doesn’t work, period. Yes, I know there are good examples showing impressive results with RAG. However, these examples don’t necessarily point out that it requires tweaking. Parameters like context size, overlap and top_k may be the difference between a good result, and a horrible one. The prompts also need to be tweaked, sometimes on a per query basis. Even things like the choice of an embedding algorithm or vector database can sometimes make a difference.

To quote from the paper:

The problem here isn’t that large language models hallucinate, lie, or misrepresent the world in some way. It’s that they are not designed to represent the world at all; instead, they are designed to convey convincing lines of text. So when they are provided with a database of some sort, they use this, in one way or another, to make their responses more convincing. But they are not in any real way attempting to convey or transmit the information in the database. As Chirag Shah and Emily Bender put it: “Nothing in the design of language models (whose training task is to predict words given context) is actually designed to handle arithmetic, temporal reasoning, etc. To the extent that they sometimes get the right answer to such questions is only because they happened to synthesize relevant strings out of what was in their training data. No reasoning is involved […] Similarly, language models are prone to making stuff up […] because they are not designed to express some underlying set of information in natural language; they are only manipulating the form of language” (Shah & Bender, 2022). These models aren’t designed to transmit information, so we shouldn’t be too surprised when their assertions turn out to be false.

To summarise:

Investors, policymakers, and members of the general public make decisions on how to treat these machines and how to react to them based not on a deep technical understanding of how they work, but on the often metaphorical way in which their abilities and function are communicated. Calling their mistakes ‘hallucinations’ isn’t harmless: it lends itself to the confusion that the machines are in some way misperceiving but are nonetheless trying to convey something that they believe or have perceived. This, as we’ve argued, is the wrong metaphor. The machines are not trying to communicate something they believe or perceive. Their inaccuracy is not due to misperception or hallucination. As we have pointed out, they are not trying to convey information at all. They are bullshitting.

As I’ve discovered, LLMs have huge coherence issues with long context length. It’s no secret many of the techniques for RAG and summarisation rely on breaking down the context into smaller chunks - chain of density, MapReduce etc. By the time an LLM has finished summarising a paragraph, it may forget about it a few paragraphs later. How can we hope that an LLM will translate a DN sutta maintaining context and coherency, even if we apply chunking techniques? Some say this is only a problem with current generation LLMs, new models are coming out with 32K-128K context lengths. However, my limited testing shows this is simply a marketing number - models with large context sizes don’t seem to be significantly better at maintaining coherence and context.

Given the above, I think there are real issues with using LLMs on anything related to Buddhism. These models have no regard for the truth, they do not care about realisation, or extinguishment, or the cessation of suffering. They have a huge potential for creating more suffering, and delaying the achievement of the soteriological goal.

1 Like

They also have no regard for the underbelly, conveniently out of view: massive data center requirements to support it all and the immense human toll to get the rare earth elements and label the dumb data for Tesla.

1 Like

Those coal plants will keep on burning - Big Tech is quietly walking back their climate change commitments in the AI race:

3 Likes

Interesting article. It makes reference to “bionic duckweed”:

(The term comes from a real instance in the wild, in which the UK government was advised against electrifying railways in 2007 because “we might have … trains using hydrogen developed from bionic duckweed in 15 years’ time … we might have to take the wires down and it would all be wasted”. Seventeen years on, the UK continues to run diesel engines on non-electrified lines.)

And from the linked article on the topic:

In its broader sense, bionic duckweed can be thought of a sort of unobtainium that renders investment in present-day technologies pointless, unimaginative, and worst of all On The Wrong Side Of History. “Don’t invest in what can be done today, because once bionic duckweed is invented it’ll all be obsolete.” It is a sort of promissory note in reverse, forcing us into inaction today in the hope of wonders tomorrow.

This to me really puts a point on my fears related to translations of ancient texts. “Why invest in human translators now when they will be replaced in 15 years.”

1 Like

Incredible new report from that hive of radical leftist thinking … Goldman Sachs.

Gen AI: too much spend, too little benefit?

Summarized and discussed on Twitter.

https://x.com/edzitron/status/1810362077867028497

And substack:

It’s one thing for the general public to be wowed by a technology demo. But I find it increasingly strange that a trillion dollars is being invested in an industry led by a man, Sam Altman, who overtly states that they have no path to profitability, and their plan is to create “AGI” and ask it how to become profitable. I wonder how long it will be before some of his investors start looking for something a little more grounded.

1 Like

I think there is no stopping there, “AGI” is now considered a question of national security, and it is essential to fighting China.

the podcast is really long, the first half of this video summarizes it well (ignore its clickbait thumbnail):

Here’s a reality check against the AGI predictions (basically Earth lacks the resources to scale the models as exponentially as predicted):

1 Like

A ‘common fallacy’ of NHS leaders is the assumption that new technologies can reverse inequalities, the authors add. The reality is that tools such as AI can create ‘additional barriers for those with poor digital or health literacy’.

‘We caution against technocentric approaches without robust evaluation from an equity perspective,’ the paper concludes.

There’s reference to advanced radiology, for example, to support cancer diagnoses. Which comes at the expense of those who don’t have access to it. And to your point, Venerable:

Published in the Lancet Oncology journal, the paper instead argues for a back to basics approach to cancer care. Its proposals focus on solutions like getting more staff, redirecting research to less trendy areas including surgery and radiotherapy, and creating a dedicated unit for technology transfer, ensuring that treatments that have already been proven to work are actually made a part of routine care.

In the US, the health care system as a privately-operated industry is wholly removed from US government-funded AI research … so I don’t see an opportunity there to divert funds in a meaningful way. (Saying this for others to consider; I know you know this.)

And the amount of US government funding for AI research pales in comparison to industry … the only R&D research funding that the US government can use in a transformative way is in the defense industry – which, of course, they do, with some trickle-down effects. Sigh … making bombs and bomb infrastructure while opportunists tease out civil-use technology in a wink-wink kind of way.

And, to a smaller extent, in medical science.

In a Faustian kind of deal that looks pretty good in that regard, the US health care system is shielded from bionic duckweed (or futurewashing) because speculation about AI potential doesn’t generate ROI – unacceptable to stockholders no matter how attractive futurists try to make AI look in a GDP-focused way.

By contrast, the US power and transportation infrastructures are wholly dependent on federal and state-level funding for maintenance and transformation. Here I have noticed the futurewashing … consistent with the article’s reference to holding off high-speed rail investments in the UK. Also, the dilemma facing private industry and the Elon Musks of the world who must depend on these infrastructures to do what they envision.

Let’s all remember how the state of Texas, wanting to be “its own man”, decided to operate its power infrastructure as a separate grid from federally funded hub-and-spoke infrastructures throughout the rest of the country. That’s not really worked out. So the Elon Musks of the world need a Faustian kind of deal with the US government; alas, it’s not available. Oh yes-- that’s why they are now investing in nuclear fusion to power their own greedy data centers (!) (Seriously??)

It would be interesting to model the investment costs to demonstrate this. On second thought, no – it’s going down a rabbit hole. Bhante spent two months releasing his “investment” analysis and I don’t think we’ll see anything else in terms of its thoroughness and pointedness in the near future. (Or maybe I’m out-of-the-know, dead-wrong, and the other papers are making their way around academia.)

Are those securing the necessary funds and building out the AI capacity the same people who would otherwise be translating? All three of those activities require different skillsets, IMO. So, I wonder whether the AI effect is, instead, de-motivating would-be translators because their work would seem irrelevant.

Who are those would-be translators, where are they incubating now, and how is that being supported in material ways? Much gratitude to SuttaCentral for helping sort this out.

:pray:t3: :elephant:

3 Likes

I have to think the smoke-and-mirrors approach to potential ROI appeals to a select, few capital investors who aren’t held accountable until something really awful happens. Like 2008. I think Goldman Sachs generally spits out fancy numbers without ever really saying anything. But people like to think they are in-the-know because they are paying a good sum for Goldman Sachs “intellectual property”.

:elephant: :pray:t3:

1 Like