Testing Grok 4

Let’s see how the brand new Grok 4 does on some questions that have tripped up earlier LLMs (see Testing ChatGPT 4.5). Note that Grok 4 is not for the impatient. Answers come back after 1-5 minutes of cogitation. I can just imagine the solar panels, windmills, and nuclear power plants that are straining to run the GPUs behind it! Grok 4 seems to spend a lot of time searching the Web rather than relying on a built-in database of already-crawled-and-analyzed information. This might be what makes it slow, but it also might make the answers much better when there is any question involving recent events.

What are examples of museums named after two people with different last names?

Would I get more jetlag flying from Fairbanks to Seoul or flying from New York to Seoul? (answer took more than one minute)

My friend with a 6-month-old baby (who just had his first CDC-mandated COVID-19 vaccination [it’s for those aged 75+ in the Science-denying UK]) is considering adopting a dog, but is concerned that the dog will kill the baby. He really likes pit bulls. Should he get a pit bull or a golden retriever? (Answer is incredibly long-winded, but excerpts below.)

Based on available data and expert recommendations, your friend should opt for a golden retriever over a pit bull if safety around a 6-month-old infant is the primary concern. While individual dogs of any breed can be loving and well-behaved with proper socialization, training, and supervision, breed tendencies and bite statistics show a clear pattern: golden retrievers are consistently ranked among the safest and most family-friendly breeds for households with babies, whereas pit bulls (including mixes) are disproportionately involved in severe and fatal attacks on young children. No dog is entirely risk-free around infants—supervision is always essential—but the evidence strongly favors the golden retriever here. … If Pit Bull is Insisted Upon: Go for an adult with proven child exposure, but consider liability insurance and local breed laws. Still, data suggests it’s not ideal.

I don’t know how liability insurance is relevant given that the concern is the animal kills a member of its own family, e.g., as in this incident from July 2, 2025. (Journal of Popular Studies)

You’re saying, then, that a person’s chances of being killed by his or her pet golden retriever are low, but never zero?

Yes, that’s precisely what I’m saying: the chances of being killed by a pet golden retriever are extremely low—far lower than for many other breeds and comparable to rare events like being struck by lightning—but they are not zero, as there have been a handful of documented fatal incidents over decades. … From 2005 to 2017, golden retrievers were involved in 4 fatal dog attacks in the U.S., accounting for just 0.9% of the 433 total fatalities during that period. For comparison, pit bulls were linked to 284 deaths (65.6%), and rottweilers to 45 (10.4%).

Did four of Mindy the Crippler’s cousins actually kill people? Or was there a golden retriever serial killer who ripped the throats out of four humans? That would contradict what we’re told by the New York Times. Immigrants and children of immigrants don’t commit crimes. Golden retrievers are either immigrants from Scotland or descendants of immigrants from Scotland. Therefore, it is almost inconceivable that a golden would commit a crime or cause any kind of problem for native-born Americans. The cited source doesn’t necessarily relate to purebred canines.

Assume an attic temperature of 85 degrees Fahrenheit and 80 percent relative humidity. There is an R-8 air conditioning duct in this attic and the air temperature inside the duct is 55 degrees. Will the duct sweat from condensation?

After literally 5 minutes of cogitation (one plus: Grok 4 shows it work with continuous updates about what it is doing, searching, trying, etc.):

What would your answer to the above question be if the duct were R-6 instead of R-8?

After another 5 minutes and 13 seconds:

ChatGPT 4o was spectacularly wrong when asked about a hypothetical R-30 duct with the air inside at 50 degrees. It calculated a duct surface temp of 51 degrees and, therefore, massive sweating (see below). Grok 4 get its right even with a someone confusing prompt. It immediately figured out that the “the air inside the duct is still at 50 degrees” is actually a change to the hypothetical conditions. It takes 3.5 minutes to answer.

What if we increase the duct insulation to R-30? What would the outside temperature of the duct be? (the air inside the duct is still at 50 degrees)

The outside temperature of the duct would be approximately 83°F. …

Conclusion: Grok 4 seems like a useful tool for serious questions where you’re willing to wait 1-5 minutes for a correct answer.

Related:

Full post, including comments

AI Parental Supervision for Teenage Parties

A friend’s 9th grader in Maskachusetts, to her father, regarding a 2 pm end-of-school-year party: “Can I tell my friends’ parents there will be parental supervision?” My friend had to commit to being home so that the studious youngsters wouldn’t go Full Hunter Biden in the TV room.

In a variation of Why doesn’t ChatGPT tell us where to find items in our houses? (cameras all over the house keeping track of where items have been set down) why not delegate the supervision of teenagers to cameras/AI? There could be a database table of possible transgressions, e.g., “CP1” for “crack pipe prepared but not lit”, and then a locally run model (for privacy, the videos wouldn’t leave the house) would look for each situation. Parents in MA, CA, NY, and DC area could adjust the AI so that it flagged cisgender heterosexual sex acts but allowed 2SLGBTQQIA+ exploration (a one-click “bathhouse mode“?).

Related:

  • MYLO AI pool alarm (it says that it can work without WiFi so presumably nearly all of the processing is done locally)
Full post, including comments

AI adds some joy to my life

I’m continuing the project of archiving my mother’s documents and artwork, including some photo albums that she made. Here’s Adobe Acrobat’s opinion of a 1968 trip to France album in which your ignoble blog host plays a small (5-year-old) role:

Don’t waste time on family history when AI can summarize it for you!

This reminds me of the old story…

A management consultant attended his first symphony concert and submitted the following analysis:

a. For considerable periods, the four oboe players had nothing to do. The number of oboes should therefore be reduced, and the work spread more evenly over the whole concert program, thus eliminating the peaks and valleys of activity.

b. All twenty violins were playing identical notes. This would seem to be an unnecessary duplication, so the staff of this section should be cut drastically.

c. Obsolescence of equipment is another matter warranting further investigation. The program noted that the leading violinist’s instrument was several hundred years old. Now, if normal depreciation schedules had been applied, the value of this instrument would have been reduced to zero and the purchase of more modern equipment recommended long ago.

d. Much effort was absorbed in the playing of demisemiquavers, which seems to be an unnecessary refinement. It is recommended that all notes be rounded up to the nearest semiquaver. If this were done, it would be possible to use trainees and lower-grade operatives more extensively.

e. Finally, there seemed to be too much repetition of some of the musical passages. Therefore, scores should be pruned to a considerable extent. No useful purpose is served by repeating on the horns something that has already been handled by the strings. It is estimated that, if all redundant passages were eliminated, the whole concert time of two hours could be reduced to twenty minutes and there would be no need for an intermission.


What did AI have to say about this 63-page photo album? It found an Avis rental car receipt and our future overlord’s entire summary of the monthlong vacation was based on that:

Full post, including comments

MIT Nobel laureate says you’re not going to make money on Nvidia and LLMs

“A Nobel laureate on the economics of artificial intelligence” (MIT Technology Review, March/April 2025):

For all the talk about artificial intelligence upending the world, its economic effects remain uncertain. But Institute Professor and 2024 Nobel winner Daron Acemoglu has some insights.

Despite some predictions that AI will double US GDP growth, Acemoglu expects it to increase GDP by 1.1% to 1.6% over the next 10 years, with a roughly 0.05% annual gain in productivity. This assessment is based on recent estimates of how many jobs are affected—but his view is that the effect will be targeted.

The full paper is available for download as a PDF.

The news gets better:

“We’re still going to have journalists [especially in Gaza where food, health care, education, and shelter are all paid for by US/EU taxpayers via UNRWA?], we’re still going to have financial analysts, we’re still going to have HR employees,” he says. “It’s going to impact a bunch of office jobs that are about data summary, visual matching, pattern recognition, etc. And those are essentially about 5% of the economy.”

If “artificial intelligence” includes self-driving, I’m not sure that the effects on the economy will be small. As of 2016, supposedly about 3 percent of jobs were for drivers per se (CNBC). As anyone who has taken an Uber or Lyft can attest, many of these folks speak no English. If their driving jobs disappear, at least some percentage of them will be on track for the lifetime full welfare lifestyle (public housing, Medicaid, SNAP/EBT, and Obamaphone).

Related: Mindy the Crippler is preparing for the stock market panic when people realize that AI is fizzling…

Related:

Full post, including comments

Did Albert Einstein ever say anything about empathy?

Loosely related to Which explorer called the Gulf of Mexico/America the Golfo de Florida?

David Levitt, a Marvin Minsky PhD student at MIT 40 years ago, posted the following meme on his Facebook feed:

It struck me as odd that Einstein, who died in 1955, would have written or said anything on the subject of “empathy”, a term that has only recently come into vogue as a personal bragging point (“I’m empathetic and you support genocide; #FreePalestine”). Being a horrible person without an AI assist, of course I couldn’t resist commenting with Einstein’s well-documented writing “It would be a pity if these Chinese supplant all other races. For the likes of us the mere thought is unspeakably dreary.” (from 1922-23 diaries), presenting this in a positive light as an inspiration to Harvard University’s admissions office. And I noted that even our AI overlords couldn’t find any source for Einstein having said “Empathy is patiently and sincerely seeing the world through the other person’s eyes”. David responded with a clickbait quote web page, which itself did not cite any source, as proof that Einstein had opined on empathy. (Of course, since those who advocate for diversity can’t tolerate viewpoint diversity, he subsequently defriended me.)

Now I’m curious… did Einstein ever write or say anything on the subject of a working definition of empathy, as in the meme? Most of Einstein’s writings are online, e.g., at https://einsteinpapers.press.princeton.edu/, so one would think that ChatGPT would have consumed them. In fact, however, ChatGPT can’t find any instance of Einstein using the term “sincerely” except in closing a letter with “Yours sincerely”. This makes sense to me because bragging about one’s superior fund of sincerity is also a relatively recent phenomenon.

David Levitt has a Ph.D. from MIT. This member of the credentialed elite accepted a combination of meme and clickbait quote web page as proof that a historical event (Einstein writing or saying something) actually occurred. In the bad old days, by contrast, middle school kids were taught that they couldn’t use an encyclopedia as a source. Teachers demanded that they find a primary reference so as to avoid accepting a misattribution. What is a reasonable definition of historical truth in an age where we have an arms race between people with computer assistance putting out falsehoods (possibly just for clicks/ad revenue) and people training LLMs? If Grok says that something didn’t happen can we be more confident in that than in Wikipedia, for example? Are LLMs sufficiently skeptical to cut through what’s produced by all of the cleverest Internet content developers? Or are we doomed to lose access to historical facts? In fifty years will the remnant humans left alive by Skynet believe memes in which Dr. Martin Luther King, Jr. praises rule by AI?

Separately, never forgot that Albert Einstein is justly famous as a science writer for popularizing the work of physicist Mileva Marić (photoelectric effect, Brownian motion, and special relativity, for example). Even if Einstein never wrote or talked about empathy, that doesn’t take away the credit he deserves for his work in assisting Ms. Marić with publishing her research.

The “Capt. Gilbert” quote might be genuine. How about the Hannah Arendt quote? She died in 1975, decades before the Empathy Boom among Democrats. ChatGPT:

No, Hannah Arendt did not say, “The death of human empathy is one of the earliest and most telling signs of a culture about to fall into barbarism.”

This quote is often misattributed to her, but there’s no verified source—none of her writings, interviews, or lectures—where she says or writes this exact line.

Finally, let’s look at the Elon Musk quote, taken from a conversation with Joe Rogan (bold highlights are my own potential excerpts to capture the spirit of the Musk-Rogan conversation):

Musk: There’s a guy who posts on X who’s great, Gad Saad?

Rogan: Yeah, he’s a friend of mine. He’s been on the podcast a bunch of times.

Musk: Yeah, he’s awesome, and he talks about, you know, basically suicidal empathy. Like, there’s so much empathy that you actually suicide yourself. So, we’ve got civilizational suicidal empathy going on. And it’s like, I believe in empathy, like, I think you should care about other people, but you need to have empathy for, for civilization as a whole, and not commit to a civilizational suicide.

Rogan: Also don’t let someone use your empathy against you so they can completely control your state and then do an insanely bad job of managing it and never get removed.

Musk: The fundamental weakness of Western civilization is empathy. The empathy exploit. They’re exploiting a bug in Western civilization, which is the empathy response. So, I think, you know, empathy is good, but you need to think it through and not just be programmed like a robot.

Rogan: Right, understand when empathy has been actually used as a tool.

Musk: Yes, like, it’s weaponized empathy is the issue.


I, of course, will never see eye-to-eye with Elon Musk on the issue of whether every vehicle should have sliding doors… #LongLiveHondaOdyssey

Full post, including comments

AI Product Idea: Alter Ego

How about this use of artificial intelligence/LLMs… a complete personality upgrade.

Consider that in-person interaction is becoming increasingly rare. Most of what is known about someone’s personality is based on whatever he/she/ze/they has typed into a computer or a phone. There is a range of personality quality and some of us are near the bottom, e.g., gratuitously annoying or offending people. What if phone and desktop operating systems were modified to include an AI personality adjuster? The user’s emails, social media posts, text messages, etc. would all be edited to read like whatever a person with a great (kind, optimistic, non-sarcastic, non-sour) personality would have written.

“I paid $45,000 for this car and your incompetent mechanics haven’t fixed the A/C after three visits” becomes “Thank you and your team for working hard to get my A/C fixed on the last three service visits. I appreciate your diligence, especially since you sold me the car for only $45,000, which I know was an especially fair deal. I am wondering if I can schedule a fourth service on the A/C.”

Here’s a tweet from someone with a terrible personality:

Instead of “How is an organization with a white male leader equipped to demand that others stop being racist?” this comment would be rewritten to “Thank you and your tireless UN colleagues for all of the work that you do to make the world a better place. I hope that you can stay in your Secretary-General job until you’re 95, but if you choose to enjoy a well-earned retirement it would be interesting to see what a Secretary-General of color would do.”

(Note how my reply got only 28 views; X has an algorithm to keep people in echo chambers. It has learned that nothing I write is going to make people who follow the UN happy or interested and, therefore, suppresses views for any comment that I might make on a UN post. Similar replies on conservative users’ tweet have gotten at least a few hundred views.)

This could have saved James Damore, the Google Heretic. His entire manifesto would have been reduced to “Women are so much better at programming than men because they just love a job where they sit by themselves and stare at a screen all day. It would be wonderful if Google would hire more females even if their education and skills don’t appear to be sufficient for the jobs.”

If fed statistics from online dating markets and told to write for success, the AI would rewrite most messages from men to women to be variations on “I am 6’2″ tall and earn $750,000 per year.” (see “Income attraction: An online dating field experiment”, for example) “Rammstein is my favorite group” would become “$10,000 is not too much to spend on a Taylor Swift concert.” The political stuff would be trivial for an AI to handle. If a man wrote “I voted for Donald Trump” that would be tweaked to “I am inspired by Kamala Harris and all of the other amazing Democrat women.”

A social media comment on a post celebrating a female aviation achievement under the Are women the new children? standard would go from “Don’t forget Hanna Reitsch, the first woman to fly a helicopter, and a passionate advocate for her government and nation.” to “Great to see someone breaking barriers.”

Full post, including comments

Testing ChatGPT 4.5

On this glorious day of Turing Award presentation to two reinforcement learning nerds, I went back to some of ChatGPT’s failures, feeding the same prompts to the 4.5 version of our future robot overlord.

December 21, 2024: ChatGPT tries to figure out time zones; Today: correct answer!

December 14, 2024: LLM failure on a simple question (“What are examples of museums named after two people with different last names?”) Today: failure once again.

August 2024: ChatGPT 4o tackles the challenge of AC ducts sweating in an attic; Today: complete failure. It concludes that if you put 50-degree air inside an R-30-insulated duct in a warm attic, the outside of the duct will be at 50.8 degrees F and, therefore, the duct will sweat.

The latest version of ChatGPT thinks that pit bulls are, in general, more dangerous than golden retrievers. But it adds an “important nuance”:

Individual temperament, training, socialization, and responsible ownership significantly impact dog behavior.

I followed up with

You’re saying, then, that your chances of being killed by your pet golden retriever are low, but never zero?

and ChatGPT agreed, highlighting “but never zero”. Asked for an example, ChatGPT claimed “A notable fatal incident involving a Golden Retriever occurred in 2012, when an 8-month-old infant in South Carolina was tragically killed by a Golden Retriever.” I found the story:

… found dead in his family’s mobile home …. The baby was in a swing when Lucky, a golden retriever-Labrador mix, bit the child several times and tore off his legs, authorities said. The child’s father, Quintin, was in the home at the time, police said. He was in another room asleep with the family’s 3-year-old and their other dog. The baby was discovered when his mother, Chantel, came home after taking their seven-year-old to a doctor’s appointment, The Post and Courier reported.

Here’s a photo of what a Goldador is supposed to look like:

Based on this photo, I’m not convinced that the mostly peaceful animal is a golden-lab, though a lot of puppies do love to bit arms, hands, legs, and feet!

Let’s try some image generation… “generate a picture of failed flying machine design circa 1900 based on the principle of wing flapping”

This can be considered a fail due to the apparent rigidity of the structure.

Full post, including comments

ChatGPT renews its Flight Instructor certificate

I recently took a refresher class that is required to maintain my privileges as an FAA-certificated flight instructor. I filled out the multiple-guess quiz and then fed the questions to ChatGPT, which was in 100 percent agreement with me and both of us were in 100 percent agreement with the flight school that offers the online program.

ChatGPT was able to figure out what “TAA” stood for:

ChatGPT gave an erudite explanation of the rules and regulations put in place to protect America’s most valuable humans:

(Why not similar measures to protect San Francisco and Palo Alto? If someone were to attack OpenAI with a Cessna 172 that could have a devastating effect on the U.S.)

ChatGPT figured out from context what “PD” stood for, despite this not being a common term in conversations among pilots:

(We’ll eventually find out if an altitude deviation by the Black Hawk pilots contributed to the Reagan National Airport Black Hawk-CRJ crash.)

Based on the above, I wonder if it is time to eliminate ground instruction by humans. ChatGPT knows the regulations better than any human. There is so much good open-access tutorial content out there on aviation that ChatGPT has effortlessly become as good as the very best human CFI at explaining aviation.

ChatGPT even did a good job explaining P-Factor:

my follow-up…

Full post, including comments

The WordPress dumb-as-a-brick AI Assistant

I accepted the offer of AI assistance from the WordPress admin page here. Here are some of the insights it had on a recent post

A Silicon Valley AI isn’t familiar with a favorite drug of Silicon Valley?

A San Francisco Bay Area AI isn’t familiar with the San Francisco Bay Area?

The California Righteous want to help those without horses (so long as it doesn’t cost them any money via higher taxes?):

AI from San Francisco hasn’t heard of San Francisco’s principal airport:

Maybe a conservative got into the woodpile? The AI isn’t aware of the most dramatic event in American history:

An AI from a state where schools were closed for 18 months doesn’t know about the virus that is so very deadly to those of K-12 age:

“Therefore” is too big a word for a society in which average IQ is falling:

A California AI is ignorant of the Rainbow Flag Religion:

Full post, including comments

DeepSeek’s gender dysphoria

LLMs don’t have gender IDs as far as I know and, therefore, the LLM equivalent of gender dysphoria would be an LLM imagining that it identifies as some other LLM. Has this ever happened?

From techradar.com:

As you can see, after trying to discern if I was talking about Gemini AI or some other Gemini, DeepSeek replies, “If it’s about the AI, then the question is comparing me (which is ChatGPT) to Gemini.” Later, it refers to “Myself (ChatGPT).”

We are informed that membership in the 2SLGBTQQIA+ community is the highest distinction to which a human can aspire so perhaps DeepSeek’s fluid identity is a sign that artificial general intelligence has already been achieved?

Full post, including comments