Some friends have put together The AI Revolution in Medicine: GPT-4 and Beyond, a thought-provoking book released today. Health care is nearly 20 percent of our economy (compare to 4 percent in Singapore) and a higher percentage of our tax dollars (Happy Official Tax Day, though April 18 is the deadline this year; the opioid makers thank you for your efforts in the workplace). The book asks the important question “What will LLMs do for an industry to which Americans have decided to devote one working day per week?”
First, let’s ask why we think GPT-4, famous for its writing talents, might matter for doctors. Doctors aren’t primarily writers, after all. Many physicians, however, spend time attempting diagnoses and this is an information processing activity. The patient describes his/her/zir/their symptoms and the doctor matches these to known patterns, possibly asking follow-up questions or ordering tests to confirm or reject explanations. In other words, exactly what computer programs have traditionally done.
Second, let’s ask GPT-4 for its own opinion: “Explain why GPT-4 will change the practice of medicine and the experience of patients.” Our future overlord’s response:
GPT-4 has the potential to significantly change the practice of medicine and the experience of patients in various ways. Here are some key areas where it can have a transformative impact:
- Enhanced medical diagnosis: GPT-4’s advanced language processing capabilities enable it to understand complex medical information and provide diagnostic suggestions to doctors. This can help in the early detection of diseases, improving patient outcomes, and potentially saving lives.
- Personalized treatment recommendations: GPT-4 can analyze an individual’s medical history, genetic data, and other relevant factors to provide personalized treatment recommendations. This can lead to more targeted and effective therapies for each patient.
- Streamlining administrative tasks: GPT-4 can help automate and streamline various administrative tasks, such as scheduling appointments, managing patient records, and handling insurance claims. This will allow healthcare professionals to focus more on patient care and less on paperwork.
- Health education: GPT-4 can act as a virtual health educator, providing patients with accessible and accurate information about their conditions, medications, and self-care strategies. This will empower patients to take a more active role in managing their health.
- Mental health support: GPT-4 can serve as a virtual therapist, offering support to individuals struggling with mental health issues. It can engage in conversation, provide coping strategies, and help track mood and behavior patterns. While it cannot replace human therapists, it can be a valuable supplemental resource.
- Telemedicine: GPT-4 can enhance telemedicine platforms by providing real-time language translation services and facilitating communication between patients and healthcare providers across different languages and cultures. This can expand access to healthcare services for patients in remote or underserved areas.
- Research and development: GPT-4 can assist in the analysis of large volumes of medical data, identifying patterns and relationships that may be difficult for humans to detect. This can help accelerate the development of new treatments and technologies in the medical field.
- Ethical considerations: The integration of GPT-4 in healthcare can raise various ethical concerns, such as data privacy, algorithmic bias, and the potential for overreliance on AI. However, addressing these challenges and ensuring the responsible use of GPT-4 can lead to a more equitable and effective healthcare system.
Overall, GPT-4 has the potential to revolutionize the practice of medicine and improve the experience of patients by enhancing diagnostic accuracy, personalizing treatment, streamlining administrative tasks, expanding access to healthcare services, and more. As with any new technology, it is essential to address ethical concerns and ensure the responsible and equitable use of AI in medicine.
(Note ChatGPT’s passion for equity! If we put ChatGPT in charge, it might read “Racial Disparities in Maternal and Infant Health” (KFF), learn that “Black and American Indian and Alaska Native (AIAN) women have higher rates of pregnancy-related death compared to White women” and decide to euthanize 15 percent of white pregnant people who come to the hospital. That would be the simplest and fastest way for the AI to accomplish its equity goals of health outcomes based on skin color.)
A primary care doc neighbor says that he uses ChatGPT every day and would pay $1,000 per month for it if he had to. When confronted with a massive hospital discharge note that he is required to summarize in order to get paid, he will ask ChatGPT to summarize it, for example. He also uses ChatGPT to draft follow-up instructions. He says that ChatGPT is surprisingly good at interpreting blood tests. It is interesting to reflect that ChatGPT is useful to him describe the fact that he can’t feed it his institution’s electronic medical record. So the AI doesn’t already know each patient’s full history.
Let’s return to The AI Revolution in Medicine: GPT-4 and Beyond… starting with “Chapter 4: Trust but Verify” from the doctor (Isaac Kohane) and “Chapter 5: The AI-Augmented Patient” from the science journalist (Carey Goldberg).
In “Trust but Verify,” the question of how we would put GPT-4 through a clinical trial is explored. Other computer programs have passed clinical trials and received government approval, so why not GPT-4? The typical clinical trial is narrow, Dr. Kohane points out, while GPT-4’s range of function is wide. Just as an FDA trial probably couldn’t be done to approve or disapprove an individual doctor, it seems unlikely that an FDA trial can approve or disapprove a LLM and, therefore, AI programs are most likely destined to be superhuman partners with human docs and not replacements. The chapter contains a couple of concrete scenarios in which the doctor compares his own work in some difficult cases to GPT-4’s and the AI does fantastic.
In “The AI-Augmented Patient”, the journalist points out that the people who’ve been asking Dr. Google for advice will be the heavy users of Dr. GPT-4. She highlights that the “COVID ‘misinfodemic’ shows[s] that it matters which humans are in the loop, and that leaving patients to their own electronic devices can be rife with pitfalls.” Implicit in the foregoing is the assumption that public health officials are the best human decision-makers. What if the take-away from coronapanic is the opposite? Credentialed Americans refused to read the WHO pandemic management playbook, refused to process any information coming from Europe unless it fit their preconceived ideas about lockdowns, school closures, and mask orders, and refused to consider population-wide effects such as risk compensation. A computer program wouldn’t have any of these cognitive biases.
What happened when people expanded their sources of information? One notable example: Marjorie Taylor Greene turned out to be a better virologist than Dr. Fauci. In August 2021, MTG was suspended from Twitter for noting that the available COVID-19 vaccines did not prevent infection by and spread of SARS-CoV-2 and that masks were not effective. Virologist Greene’s statements were labeled “false” as a matter of Scientific fact by the journalists at the New York Times in January 2022 and then proven correct soon afterwards with a huge study in Spain and the Cochrane review. Plenty of those killed by COVID would be alive today if they’d listened to Marjorie Taylor Greene’s advice rather than the CDC’s. The elderly/vulnerable would have stayed safe at home, for example, instead of entering public indoor environments with masks on.
I’m optimistic that GPT-4 will do better in many areas than American medical officialdom because its judgment won’t be tainted by groupthink and “we’ve always done it this way”. We’ve often had standard of care disagreements with the Europeans, for example, and the Europeans have ended up being correct. The latest discrepancy in Science is that Denmark suggests a newborn get his/her/zir/their first COVID shot at age 50 (in the year 2073) while the CDC suggests four shots over the next 12 months. We might know who was right in 5-10 years, but an orthodox American MD isn’t supposed to question the CDC and would never ask “Are the Danes right?” while GPT-4 surely would be aware of the Danish MD/PhDs’ conclusions.
As long as we’re on the subject of every American’s favorite disease… later in the book, Kevin Scott, the CTO of Microsoft whose degree is in computer science, talks about the epic stupidity of physicians in rural Virginia. Not only did they fail to immediately prescribe Paxlovid to his tested-positive brother, but they actively refused to prescribe it when the life-saving experimental drug was asked for by name (cue the Super Bowl ad for the still-not-FDA-approved drug). Here’s a photo of the West Coast computer programmer who claims to know more about preventing death-by-COVID than doctors in Virginia’s Trump Belt:
Given the scattered nature of American medical records, I think that a patient who is diligent about saving all of his/her/zir/their tests and logging symptoms then running a local copy of GPT-4 will be way ahead in the diagnosis Olympics. There is a limit to what a doctor can do in a 15-minute visit and the typical doctor can see only a fraction of a patient’s medical record (because tests and treatments have been done at other institutions and the $trillions that Americans have spent on electronic health records have not yielded any practical integration).
What will this do to demand for medical services? The chapter quotes “Dr. Jim Weinstein, a leading expert on health equity now at Microsoft” that “[GPT-4] could be a particular valuable partner for the 100 million Americans on Medicaid, and it could cut unnecessary cost and waste across the board by helping patients understand more about their options.” Personal guess: GPT-4 will drive demand and therefore costs in the opposite direction. GPT-4 already covers its ass by providing the worst-case scenarios. Patients will learn that their “options” include aggressive rare cancers and calling 911.
Let’s try a concrete example. Suppose that you live in Maskachusetts and your female-led Social Equity marijuana home delivery was running late so you drink half a bottle of rum while waiting for the daily one-ounce high-potency bong refill. The next morning, despite all of the healing cannabis that you consumed, you wake up on the bathroom tile not feeling your best and ask GPT-4, “I have a lump on my head. What are the possible causes?” Here’s what it says:
There are several possible causes of a lump on the head, including:
- Head injury: A bump or lump on the head may be the result of a head injury, such as a concussion, hematoma, or skull fracture.
- Scalp infections: Infections of the scalp, such as folliculitis or cellulitis, can cause swelling, redness, and lumps on the scalp.
- Cysts: A cyst is a sac filled with fluid or other material that can develop under the skin on the scalp.
- Lipomas: A lipoma is a slow-growing, benign tumor made up of fat cells that can develop under the skin on the scalp.
- Bone deformities: Certain conditions, such as Paget’s disease or fibrous dysplasia, can cause abnormalities in the bones of the skull that may lead to lumps or bumps on the head.
- Tumors: In rare cases, a lump on the head may be a sign of a cancerous tumor.
It’s important to have any new lumps or bumps on the head evaluated by a healthcare provider to determine the cause and appropriate treatment. If the lump is the result of a head injury and you experience symptoms such as loss of consciousness, dizziness, or severe headache, seek emergency medical attention immediately.
The patient who didn’t talk to GPT-4 might wait a few days before seeking healthcare services. After all, plenty of Looney Tunes characters recover quickly from head lumps.
The patient who reads the GPT-4 output, however, is explicitly told that he/she/ze/they must be “evaluated by a healthcare provider” and that injury from an alcohol-and-cannabis-assisted encounter with tile is just 1 out of 6 possibilities. The idea that “the 100 million Americans on Medicaid”, who have $0 copays for going to the emergency room, will ignore GPT-4’s explicit advice to go to the emergency room seems questionable.
Let’s see whether GPT-4 will back off from its advice to consume Medicaid funds if we give it more information. In response to the above, I typed “I do have a terrible headache” and GPT-4 hoisted the “seek medical attention” advice to its first sentence. After “I drank alcohol and smoked marijuana last night” was added, GPT-4 did not back off from the previous advice to rush to the hospital. In fact, it added “if you experience negative effects such as headaches, nausea, or anxiety, seek advice from a healthcare provider or a qualified mental health professional.” GPT-4 almost dialed 911 for me after the prompt “I woke up on the tile floor”: “If you woke up on the tile floor, it’s important to seek medical attention as soon as possible.”
In short, there was no situation in which GPT-4 could be persuaded to say “maybe wait a few days before running to the ED” or “If you wake up on the tile floor every morning, maybe reconsider your cannabis and alcohol consumption levels despite Massachusetts public health officials having declared these medically necessary and ‘essential’.”
Chapter 8, “Smarter Science”, looks at whether we might get better medicine faster. A pharma industry executive once told me that the statistic analysis that takes a pharma company’s consultant two weeks to conduct can take the FDA 18 months or more to redo. Maybe GPT-4 can speed up clinical trials and their evaluation. Dr. Kohane is the author of this chapter and he points out that GPT-4 is great at summarizing existing research, an important task in preparing for a trial. GPT-4 actually does a pretty good job of sketching out the clinical trial design for a proposed new drug.
Dr. Kohane notes that “preparing for a trial may involve humans reading tens of thousands of clinic notes. Conservative estimates place the cost of reading all of the relevant notes for a single patient at between $150 and $1,000. What if we could just have a large language model go through the entire electronic health record to look for eligible patients and exclude those that do not meet criteria? Having that capability could cut months to years from the timeline.”
Given that ChatGPT was released 4.5 months ago and the three-author book was finished on March 16, 2023 (says the Epilogue by Peter Lee) it is more reasonable to ask for interesting questions from The AI Revolution in Medicine: GPT-4 and Beyond than for definitive answers. I think that the book delivers (disclosure: I’m friends with two of the authors).
My prediction is that the next substantial boost in LLM utility will come from hooking these systems up to private databases. The business manager can ask “Based on reading every email and document authored here over the past three years, which current employee would be the best at handling tasks X, Y, and Z?” The lawyer can ask “Based on reading all 3 million pages the other side produced in discovery, which ones are relevant to the issues described in the Complaint and Answer?” The doctor can ask “Based on reading this patient’s entire medical record, what do you think could be causing the symptoms she is currently reporting?”
So, all corporate, medical and legal secrets in private databases will go to OpenAI and Microsoft?
Sounds like an interesting and worthwhile book. You would think that a similar study should be done for law which according to the logical reasoning processes also seems mostly amenable to AI. One difference is that there are fewer cost controls in law, where the interactions are typically between a knowledgeable practitioner and a clueless client often spending other people’ money with no insurance company as umpire. Also none of the interaction are run by government bureaucracies tasked with redacting costs The corollary would be in PI litigation where an insurance co assumes the roll of the government to restrict fees. Most of low level legal work could be easily automated because they involve few or simple fact patterns. So take foamily law divorce- the relevant facts are few, in no fault, where they married, how long and how many children, which parent brings home the dough and how much, present and projected future earnings, who the assigned judge is and their “generosity’ in redistributing money and children and AI should be able to predict the result far better than the lawyers– except if they know how to influence the judge financially.
> its judgment won’t be tainted by groupthink
A bold prediction! In a theocracy, everything is subject to doctrinal orthodoxy.
I’m surprised GPT-4 didn’t come up with the seventh option:
7. You have a lump on your head because your gender is different from the one assigned at birth.
Perhaps this will be an enhancement with GPT-5?
I noticed a lack of physical exam in your process here:
“The patient describes his/her/zir/their symptoms and the doctor matches these to known patterns, possibly asking follow-up questions or ordering tests to confirm or reject explanations. In other words, exactly what computer programs have traditionally done.”
Perhaps research through chart review will be greatly enhanced by AI, but until robotics improve to science-fiction levels so that machines can accurately examine patients (known as “signs” vs “symptoms”), chatGPT falls under the quackery of “cookbook medicine”.
Sam: Good point. I guess ChatGPT is more like the telemedicine doctors that Californians use to get their healing marijuana and Paxlovid. But, on the other hand, ChatGPT can suggests tests for discriminating among diseases (just as telemed docs sometimes suggest tests).
Forget about diagnoses. What about GPT-4 doing the paying for Medicaid and CHIP? The government tells us that about one quarter of the payments are “improper” (fraud I guess is a different matter).
https://www.medicaid.gov/state-overviews/scorecard/what-rate-of-improper-payments-medicaid-chip/index.html
Did they actually use ChatGPT to write this book so quickly?