Teaching Naked

AI and Teaching Workshops

AI Models and Tools

For workshop prompts, click here.
For workshop slides (with 80+ pages of citations): click here
–Creating AI SIMULATIONS and CUSTOM BOTS
–Using AI with “THINKING” and “DEEP RESEARCH”
–Using SYSTEM PROMPTS

Clicking the links below will open the listed AI tools in new tabs in your browser. The list begins with LLM models (proprietary, open-source, regional and reasoning) and then a range of API tools (followed by agents and browser extensions)

PROPRIETARY FRONTIER MODELS: Here are some AI models you should know. They are from different companies, using different different neural networks and with different personalities and abilities. The paid versions are often substantially better and smarter.

THE BIG THREE

Claude.ai Lots of tokens for writing context and designed with a constitution, to do no harm. The new Sonnet 3.7 is a third generation model and currently free. Claude can run code, can analyze images and charts in pdfs and can access your Google Docs, Calendar, and Gmail, but only paid users (as of March 23) can actively search the web. 3.7 also has a reasoning mode (called “thinking mode”) which is not free. Claude is still the best writer and most creative. Turn on “artifacts” (in settings) to work on a document separate from the chat. A new Research tool lets you search across both the web and your internal documents.
ChatGPT can search the web (for ideas and not just words) is multimodal and can interact with live video and voice, but there are lots of different versions. You should mostly use 4o (omni). 4o-mini is faster, but less precise (a typical tradeoff) and there is also a new set of 4.1 models (API only so far that replaces the more creative 4.5) and a series of o3 and o4mini reasoning models that can search the web, reason about images and are agenic (see below). There is also a “canvas” where you can collaborate. They absolutely need a better set of names! Note that ChatGPT now remembers everything you ever said to it, but you can opt-out in settings. Copilot is really another version of ChatGPT (Microsoft owns half of OpenAI) that integrates with Microsoft projects–so it is better with Excel and Ppt, but none are great with that yet. If your organization gives you access to this in MS Office you also are FERPA and HIPPA secure. In CoPilot you can try Think Deeper (the o1 reasoning model). Paid ChatGPT ($20/month) gets you more advanced versions of all of this and the ability to create custom GPTs (see Custom Bots.)
Gemini (formerly Google Bard) is the third powerhouse multimodal model. Gemini 2.5 Pro is available in Google AI Studio where you can also find LearnLM, Flash 2 Images and graphing (see below). Gemini has a huge token window (so it is the best for large files) and allows you to search the web, watch live video, interact with voice and use DeepResearch (see below). Google also has a deal with the Associated Press to get real-time news updates.)
All three of these models allow you to create split screen documents, code or a website outside of the chat. (Claude calls these “artifacts” and ChatGPT and Gemini both have a “Canvas” feature.)

OTHER MODELS

Grok 3 (released February 17, 2025) is a new third generation family of models (including a mini version) that is currently free. It can analyze images and has real-time access to the internet and social network X. Grok Studio is a collaborative space and also has direct Google Drive integration. Grok also allows users to have conversations with “extreme personalities.”
Sonus is a new “set”family” of proprietary models (Pro, Air, Mini and Pro with Reasoning, see below) that is already competitive with the very best existing models. For the moment, this is the best (only) way you can try the new reasoning models for free: here.
WolframAlpha combines the computational powers of Wolfram|Alpha with ChatGPT.
Ernie from Baidu (the Chinese Google search engine) is another multimodal frontier model. Ernie X1 is a reasoning model.
Pi is focused on dialogue and role-playing and has a good voice interface.
You.com is set up to be a search engine competitor to Google but with more privacy and easier customization (so it now faces competition from Gemini, but also ChatGPT Search).
Amazon Nova are both good Class 2 models (so on a par with GPT 4).
Poe (currently $5/month!!) ChatPlayground ($17/month) and ChatHub are consolidators that provides access to multiple AI through one interface.
Ethan Mollick has written an excellent summary (Jan 26, 2025) of the differences and how to pick which model to use.

OPEN SOURCE MODELS: There are now open source models that are just as good as the best proprietary frontier models, and even better in some specialized areas. You can download most of these models from Github, Azure (Microsoft) or HuggingFace. You can then fine-tune and run them on your laptop, which deals with most privacy issues but also transfers the security risk to you.

Chinese DeepSeek is strong with text and can also search the web. You can try the excellent R3 model here. It is a cheaper API option and it was built for a fraction of the price/chips/energy of the big models through the clever use of Multi-head Latent Attention (MLA) that combines even more values into tokens (the simple version is tokens that read phrases, so less precision but turns out it was not needed and not all tokens are active all the time for a huge energy, cost and time savings). Here is a great non-technical summary of how DeepSeek is important or you could read this tech paper.
Meta AI is now Llama 4 (and now a family of models) which is a huge Class 3 model (which means it can remember more pages than others) but it they also seemed to have fudged the benchmarks. It does not require a login.
Mistral (available as an API and Le Chat and also in a reasoning version called Magistral) is an open source LLM from France that real time internet search (with press wire access for news) and is very fast and more multilingual than the big four. It also creates great images using Black Forest Labs Flux Ultra. Mistral Medium outperformed Claude 3.7 and 4o in many benchmarks.
Kimi is an excellent free multimodal open source Chinese AI that has a particularly large context window (good for long papers, prompts, and conversations – you can upload 50 files 100MB EACH), does very well in math and coding (beating GPT-4o and Claude Sonnet 3.5 on Codeforces), searches the web, can analyze charts and also has reasoning.
Qwen 2.5 from Alibaba comes in a range of levels (select at the upper left in Qwen Chat). Qwen2.5 beats that beats GPT-4o, Claude-3.5 Sonnet, and DeepSeek-V3 while Qwen2.5-1M can handle up to one million tokens. Qwen2-Math scores high (maybe the highest for pure LLMs?) at math.
Deep Cognito also has a family of open-sources models in a variety of sizes.
MiMo from Xiaomi is an open-source reasoning model that outperforms o1mini.
Huggingface is a chatbot running on Llama. Start here to get a sense of what open source can do. No login is required.
Llama 3 from Meta, with 70B parameters is very close in performance to the best paid models.
Falcon (Mamba 7B ) is an open source LLM from the UAE uses new “state space” architecture (SSLM)instead of the transformer architecture.

REGIONAL and CULTURALLY-SPECIFIC MODELS: Since people and cultures think differently, we are starting to see LLMs that are trained on culturally specific data sets. Note that if you want a culturally specific answer, you can and should still try this with the frontier models (try asking Claude to response as a Black professor and compare the response to Latimer).

Latimer (named after African-American engineer Lewis Latimer) aims to better represent diverse communities by adding further training from (verified and licensed) books, oral histories and sources from Black and Brown communities.(Latimer is a fine-tuned version of LLAMA.)
Mistral Saba is a 24B parameter version of the French Mistral model trained on curated datasets from across the Middle East and South Asia. It supports Arabic and many Indian-origin languages like Tamil.
Fanar is a “culturally and regionally aware” Arabic LLM fluent in Arabic dialects from the Qatar Ministry of Communications and Information Technology (MCIT) and the Qatar Computing Research Institute of Hamad Bin Khalifa University (HBKU).
LatamGPT from Chile’s National Center for Artificial Intelligence (Cenia) is an open source model trained on “characteristic data from Latin America” and is due to launch in summer of 2025.
There is also a group of behavioral scientists exploring the training of LLMs on historic texts Viking, Latin or Medieval Arabic etc.)

REASONING MODELS: The next big thing are models that process through problems before answering. They do NOT actually reason (although it appears that way) but they have internal instructions that break problems down into steps which (especially when combined with web searching) improves accuracy and allows much more complicated problem solving. You need to use them a little differently (there is a prompt template here): give it something hard to do and note (or ask) how it describes its reasoning. Look at this example. The progress here has been rapid and substantial (read this report about the new o3 from Dec 2024), but in mid-January 2025 they were all behind paywalls. You can try these for free:

ChatGPT (o3 and 4.5) have a “Thinking” or “Reason” mode in the chat box (the button types and location keep changing). You can also access OpenAi’s o1 model through CoPilot by clicking the Think Deeper button. With a paid Pro subscription there are many more models, but the numbering is a joke.
In Gemini this is the “Deep Reasoning” button (which includes web search) also available in Google’s AI Studio.
Sonus Pro with Reasoning (a new model from a start-up!) Select the Pro version with Reasoning turned on) but Open sources models appear to be keeping up.
DeepSeek’s v3 R1 is also open source which means you can download and build with it. You can try DeepSeekR1 on an American server using Perplexity (Pro Search). Here is a great non-technical summary of how DeepSeek works that includes a good summary of how reasoning models work.
Magistral is the reasoning version of Mistral.
Qwen 2.5 Max is another Chinese open source reasoning model. Select the 2.5Max model in the upper left when you are in Qwen chat. It is also available in HuggingFace.
Kimi also has reasoning, but I’ve not tried it yet.
Claude.ai Sonnet 3.7 also has a “thinking mode” for paid subscribers.
Manus is an agent (see below) but also works as a reasoning model that can conduct research and produce documents.

EpochAI is an important independent organization that is keeping track of these models, how they compare and where we might be going. They maintain a great dashboardcomparing capabilities of the best models (against their own benchmarks) and also this larger data set of virtual all models. They produce excellent reports about trends including a recent prediction that AI will continue to improve rapidly.

APIs = APPLICATION PROGRAM INTERFACE. This is a huge category and most of the new products you see are here using an API to interact with one of the frontier models for you. You can often replicate the results with longer and careful prompting, but these are very useful shortcuts.

APIs for INFORMATION, LITERATURE SEARCH & RESEARCH

Perplexity.ai was designed as an AI-powered chatbot search engine so it gives you the most flexible interface and options for the TYPE of web search you want–academic, social, financial etc. It answers your questions with the sources cited using multiple frontier models. Perplexity also has an Internal Knowledge Search that will search your files for info and many other useful tools.
Consensus.app and Elicit.com are academic research tools that limit data search to the 200M published papers in Semantic Scholar and use AI (ChatGPT) to allow you to filter by claims, methodology, sample size and more. Consensus includes a “consensus meter” that provides an estimate of the consensus in the published literature. Here is the result when asking “do brain games work?” Elicit has a filter that allows you to search by the quality of journal.
Storm (short for brainstorm) is a new research tool from Stanford that creates a Wikipedia-like report on the topic of your choice. It looks at more than just Semantic Scholar publications. It will write/summarize from different perspectives (ex. sociologist vs political scientist) and tell you what sources it used. Compare the results and format with what you get from Consensus.
Here is a comparison of Consensus, Elicit, Storm and Perplexity answering the question “do polls predict elections?”
FutureHouse is a collection of tools for scientific discovery. The “Owl” tool, for example, does a “precedent search” (has someone already done this?) and “Phoenix” which is an agent that uses cheminformatics tools to do chemistry and can plan synthesis and design new molecules. You can watch a demo here.
Compare these to Google’s LearnAbout.
Researchrabbit and Litmaps are both more visual, showing you a network of articles that relate to a topic. It is a bit like Spotify for research papers (you liked that, you should know about this).
SciSpace also has similar functions but with a broader suite of tools, like a paraphraser that rewrites or helps explain passages (something you can also find in ExplainPaper.) All four of these are essential lit review tools.
Scite extracts citations and uses AI to analyze if they are cited with support or contradiction in other papers. Upload a pdf or your citations and find out quickly about the impact of the work.
Semantic Scholar a free AI search tool with a pdf reader.
DeepResearch (available only with Gemini Advanced) creates a multi-step research plan. If you approve, it searches the web, analyzing relevant information. It repeats this process multiple times to generate a report of the key findings with links (which you can export into a Google Doc). The difference from the other research tools is this is agenic: it searches the web and repeats the process.

APIs for ANALYSIS, & WORKING with DATA. Here is a subcategory of tools that allow you to control the data set or knowledge base:

NotebookLM is Googles version of a research assistant but it works only on the documents (up to 50) you upload (up to 500,000 words EACH). Use it as a virtual committee assistant: link or upload all of your institution policy documents plus the committee charge and then faculty can interrogate s needed. Try uploading a pdf (or 50) and asking for a study guide or an interactive mind map or podcast (you can interrupt to ask questions). Here is an AI-created podcast about the first part of my Teaching Change book. If you upload files to Gemini, you will see a “Generate Audio Overview” button appear which also creates a podcast summary.
Mem has similar features that allow you to “chat with your data.” Perplexity calls its version of this Spaces.
There are lots of “chat with your data” sites that pitch themselves as “research” tools, but are really aimed at students. Sites like Scholarcy, Mindgrasp (“Learn 10x faster!”) and CoralAI, (“read documents faster”) focus on summaries and audio transcriptions (like lectures you did not attend).
The tech behind the podcast feature in NotebookLM is Illuminate, which has more features: you can change the voices, accents or styles and turn any content or webpage into a dialogue. A competitor is GenFM from ElevenLabs.
Nomic Atlas îs best for analysis of huge unstructured data sets and does a range of visualizations. Julius also allows you to do computations and visualizations with your data and also writes reports, finds insights and does analysis.
Ailyze is focused on qualitative data and insights and also has an AI-driven interviewer to help with collecting data.
Napkin creates infographics and visualizations from your text. You don’t have to prompt–just drop in the items you want to visualize and you get lots of options that you can then alter with easy tools. Infography also makes infographics from text.

APIs for WRITING, GRADING, TUTORS and MORE

WRITING: Grammarly, and Quillbot are already known to students (before they were enhanced with AI) and (along with the newer Caktus) blur the line of improving with cheating, offering to write paragraphs, solve problems, answer questions and check if your content can bypass AI detectors. Try Lex first for your own writing. NotionAI, Copy.ai and Jasper and many many others focus on specific types of writing or business uses.

GRADING: Your choice here is either to use one of the best (smartest) frontier models (see above) but understand that is is naive and you will need to provide lots of very specific instructions (it needs a recipe) or you can use one of these already fine-tuned models (that already knows how to cook) that often use and older, cheaper and less smart (GPT 3.5) to do the work. There are already over 70 AI grading tools available.
- GradeWiz extracts students names before grading and then sends anonymous essays to multiple AIs and then compares the results.
- CoGrader does general grading and feedback and integrates with Google classroom. Try the 2.0 version here: https://v2.cograder.com/
- TimelyGrader can import assignments from your LMS and export grades back to it.
- AI For Teachers: Free for ChatGPT Plus users
- Gradescope: Also from Turnitin.
- Kangaroos AI: Customizable rubrics and bulk uploads.
- EssayGrader: Free option (with limited rubric customization) and allows bulk uploading.
- Smodin: Limited free version but includes language translation.
- GradeCam: Instant feedback that integrates with some LMS.
- SnapGrader: Includes a scanning feature.
ChatBot Assignments with Grading: Mostly K-12 at the moment, but look for new platforms like Parlay and Mizou that create specialized chatbots for problems or situations and then grade student interactions–all within a protected environment.

NOTE-TAKING: Microsoft OneNote, Otter.ai, Fireflies, and Zoom Companion all do more than just transcribe notes, organize and summarize (often across multiple documents/meeting ). They can analyze who is talking (or interrupting) the most and some even the emotions of participants. Find the latest list of “best” on your platform. We are now starting to see meeting note taking combined with agent tools so SpinachAI creates action items on Attio, HubSpot, Salesforce, or Zoho, sends customers status updates and creates documentation in Google Docs (all in 100 languages).

STUDY ASSISTANTS: Both Nurovant, Turbolearn.ai and GradeMaxx (and MANY more) can turn lecture notes, a pdf, a phone recording or a YouTube video into an outline, practice tests, flashcards, mind maps, quizzes or notes. Learn About is hoping to be a big player in this space.
NotebookLM also does this: try uploading notes, pdf (or up to 50 documents of content) and it will create a summary, sample questions, study guide or even a podcast: here is an AI-created podcast about the first part of my Teaching Change book. (What about an assignment that asks students what information is missing?)
Storm is a new research tool from Stanford that creates a Wikipedia-like report on the topic of your choice.
There are a growing number of more visual ways to explore new topics and all of these create concept maps (and each does more): Mapify, Albus, and Heuristica(which also has templates for pro/con boxes, controversial points, flash cards and more)

CUSTOM BOTS & TUTORS: (More on simulations and custom bots here.) Deploying a custom bot in an application is easy. Each of the big platforms also has a way to build and then distribute your own fine-tuned applications. I like BoodleBox because it also allows the teacher to see everything students do–and has lots of other faculty features like “coach mode” which the chat default (and won’t provide students with direct answers. There are also GPTs (from OpenAI), Assistants(from HuggingFace), Bots (from Poe). Faculty developed writing tutors, for example, include one from Mark Marino, AI Tutor Pro from a group of Canadian faculty and MyEssayFeedback in beta from Eric Kean. The University of Sydney has now created Cognifi which also allows complete security and control over student use but there will be some institutional cost (and I think pricing is still being worked out for other countries?)
BoodleBox, SchoolA I, MagicSchool and Khanmigo all provide tools to help with specific tasks that are free, FERPA compliant and secure. This includes creating specialized tutors. BoodleBox has these instructions. In SchoolA I go to Spaces and then Create. You can simply prompt it (Help students master content X by providing an overview and asking questions etc) or you can upload documents and set a standard for mastery. Importantly, SchoolAi also has a backend that tells you have students have engaged and what they might still be confused about. Here is a great example (solving Linear Equations in One Variable from Rebecca Tyler at Great Falls College MSU).
Snorkl provides feedback to student on their verbal or visual thinking.
How to Build Your Own Customized Chatbot (free chapter from Levy and Albertos (2024 Teaching Effectively with ChatGPT.
How to use Speaker Progress in Microsoft Teams to get feedback on your/student presentations

VOICE AND DIALOGUE: This exploded in May 2024.

Hume.ai both understands your emotions and the emotions in what it is saying.
Sesame has a very natural sounding voice to voice conversational style and I prefer it for practicing conversations.
GPT 4o in May 2024 was a real game-changer and it still has the best Live Mode in its Advanced Voice Mode. Claude and Gemini can also talk to you now live. Ask it to create a customized bedtime story. Watch some of the videos to get a sense of how seeing and hearing (and interrupting) might be leveraged as a tutor or for translation.
The engine behind the podcast feature in NotebookLM is ElevenLabs, which offers more customization options but also does other text to audio conversions. (More on voice clones below.)
Resemble AI also has a no-cost, open-source model called Chatterbox that can clone any voice using just five seconds of audio and can try it here.)
Here is a comparison of several models including the new Dia from NariLabs
Amazon Nova Sonic says it understand both what you say and how you say it.
Kokoro is an open-source model that also allows you to create natural sounding speech from text.
Octave (from Hume) does this text to speech but understands what it says so can add emotions…
Scribe gives you speech to text in 99 languages.
Pi is focused on dialogue and role-playing–click the speaker icon in the upper right to have a conversation.
See Communication and Relationships below for more.

LANGUAGE LEARNING: Duolingo was first out of the gate with Duolingo Max, but you can, of course, have a conversation with ChatGPT in another language and ask it to correct your usage and pronunciation. also get a long way with the and now there are a slate of immersive language tools like Langua (more engaging than Duolingo), Glossarie.app, Speak,and TalkPal. Mondly VR and ImmerseMe add an immersive VR element. Speakable bills itself as your all-in-one language TA.
Try DeepL for translations.
LingoSub adds subtitles in any language to videos (allowing you to practice new languages).
Note that Google Meet now includes real-time language translation.

IMAGES, SLIDES, VIDEO, MUSIC and more Multimodal AI can both create and analyze sound, pictures and video. Vertex AI allows you to use all of the Google creative models (Lyria for music, Veo for video, Imagen for images, Chirp for voice and a new Flow tool to create longer video) all in one place.

IMAGES: The easiest and maybe best way to make images (since about the middle of April 2025) is natively in ChatGPT and Gemini). This means the LLM directly controls the image making and it means your prompts can be more subtle. (Try making an infographic for a complicated topic: the LLM can now both work out the concepts and put them into an image.) Note that in the general models, you need to include the stylistic prompts yourself (do you want a wide-angle realistic photo or a cubist painting? In other words–the words used to describe visual images and video are important knowledge for the user!) Ask for an image and then add an object or change your background or hair color (“don’t change anything else”). In ChatGPT you can also upload a reference image or a color palette. Still, there are plenty of other image generators:
Imagen3 is a great image generator from DeepMind, but you can now use it either in ImageFX (also Google) or in Gemini directly. Either for free.
Ideogram 3 is a very solid free image generator.
Adobe Firefly allows you to use both its own models (trained exclusively on commercially safe content–i.e. they paid artists for the work that the model was trained on) with further access to Google, OpenAI, and Black Forest Labs. Model 4 is quicker but Ultra is better. Firefly Boards can do “hundreds of variations at once.”
Image Creator from Microsoft Designer still provides free access to the latest goodies in Dalle3.
Tess Design was trained on licensed art and allows artists to create and control their own style.
Whisk allows you to use images as prompts for new images.
Komiko creates comics, manga and other animations.
Reve is a bit hit or miss for me.
DALL-E by OpenAI (but not free for the best model)
Runway (based on Gemini): Note this new Act-1 feature that allows you to control animation with your own facial expressions.
Krea offers a range of visual Ai tools that do graphics, logos and images
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer is a free and super fast image generator from MIT that also uses less computing power.
Stable Diffusion is good, cheap and open source
Midjourney is one of the best but can only be used through Discord.
Generative AI by Getty Images creates stock images

SLIDES: Canva.com, Beautiful, PageOn, Plus AI, and Gamma all make slides easily. Here is a review of seven more. Plus AI is an extension that allows you to make slides with AI in Google Slides. Napkin is also useful for slides as it creates infographics and visualizations from your text (without prompting).
You can also build slides directly in Manus or Genspark.
Here is a video about how to make slides in Google Slides with Plus AI
Here is a video on How to make Slides with ChatGPT in Power Point.

VIDEO: The “best” model for video is shifting every week at the moment BUT here is a review of the top 17 Free AI Image-To-Video Tools. He liked Veo 2 and Kling 2 the best, but we now have Veo 3 (paid but see great examples here) and Hedra 3 (free) which I think are even better and the new Flow (paid).
Veo 3 (from DeepMind) and currently free in Google AI Studio is currently the best, and includes sound creation (dialogue). See great examples here. Vertex AI allows you to use all of the Google creative models.
Kling (from China’s Kuaishou Technology). Kling 2.1 is cheaper and close to Veo 3. It depends on which comparison you watch. Here is another more detailed comparison.
Sora (from OpenAI with ChatGPT Plus). You can use this in ChatGPT but Sora gives you a set of dropdown “presets” which can be useful if you do not already understand the language of art history, image and video styles. In Sora you can also open a present (like Film Noir or Balloon World) which will show you the instructions/preset prompt which you can then customize. You can also upload images you want to use for inspiration with the “attache media” button.
LTX from Lighttricks says it is the most powerful open-source video generator and can produce video faster than you can watch it…
Runway (based on Gemini). Gen4 is excellent but not free
Hailuo has impressed lots of folks and it is free for now.
Higgsfield has lots of cool motion control effects that allow you to do camera angles like overhead sweeps.
Flux
Krea
Movie Gen (from Meta)
Hedra 3 is very good.
Pika also allows you to make a video with your younger self…
HunyuanVideo from Tencent in China.
Genie 2 converts images into interactive virtual worlds/games.
Guidde can turn a pdf, slide deck or screen cast into a “how to” video in multiple languages.

MUSIC/SONGS: Suno will convert your instructions into a new song. As always more specific instructions yield better results, and not that the real benefit is that you can customize for specific people and events. (Try creating a gratitude song for a friend in a style they like with the details of what they did.) Udio and Riffusion also do this–and I think Riffusion has a little more complexity to it. You can also use Lyria in the Google Studio, which also has a “live” mode for jamming in real time. Think about how you could customize walk-on music for your class. Here is an example from Prof. Martin’s Service Management class sat the University of South Carolina.

AVATARS and VOICE CLONES: HeyGen was first and remains the leader for both video and voice clones. (You can now also send an avatar to a Zoom meeting.) HeyGen also allows you to add a knowledge base to an interactive avatar (think you as a TA) and its Avatar IV will generate a lip-synced avatar from a photo and a script. LHM from Alibaba provides a way to turn a full-body image into a 3D avatar. With a few seconds of your voice, you can then have a clone that will read text in your voice. MiniMax, ElevenLabs and Cartesia all offer a range of wants to turn text into audio using your voice or someone else’s.
AI COMPANIONS: Character.ai, nomi.ai, replika, elliq (focused on senior lonelliness) friend, EdSight talks to students as a way to help colleges hear student voices. Sesame offers two emotionally sensitive voices/companions.

MINI MODELS and EDGE AI: These are smaller, faster and more specialized (often) OPEN SOURCE tools that you customize to live and run on your phone. Note that the ways to make an LLM better are model size (see Frontier models above), data set size and and the amount of training. Since it is not clear that larger more capable models will be cost effective, these faster smaller models (with more training) may end up being more useful. Apple Intelligence will test this idea. More smaller models are coming.

Phi-3.5 from Microsoft comes in three sizes Mini, Small and Medium (3.8-41B parameters)
OpenELM is the Apple version that comes in four sizes (270M-3B parameters)
Gemma is the open source smaller model from Google also in several sizes

BROWSER EXTENSIONS: One way to become more familiar with how AI works is to add an AI extension to your browser. If you use Chrome, some good free extensions are Perplexity AI, in the Chrome store here) SciSpace (which does everything SCiSpace above does, but in your browser), Merlin AI, Bing, or Clipy AI: now every time you do a Google search, you will also get an AI response.

AGENTS: A chatbot can only chat with you, but an “agent” can plan and execute a series of tasks, like building you a website or finding information on your computer. Agents can use multiple tools and know when to switch, so an AI agent can manage a workflow. Here are details about the “Agent2Agent” (or A2A) or “Model Context Protocol” that create these two-way connections between data sources and AI-powered tools.

Start by watching the demos from Genspark or Manus. Then ask it to build you an interactive course website using the best research and including links to video and with interactive learning activities (or just a new episode of a TV show you like). ChatGPT o3 is starting to have some agenic capabilities. Devin is not quite there, but other early tools include Swarm and Codex (also open source!) from OpenAI, Claude Computer Use (Claude Code and even 3.7 can also do tasks like create and bedbug code) and Asana. OpenAI has introduced Operator (Jan 23, 2025)–it is called “computer use” in CoPilot. Here is a demo (from Graham Clay) where Operator has been asked to write an essay in a GoogleDoc at human speed with edits. There are now lots of demos of agents doing students homework. Another use of agents (that is also about growing use of synthetic data) is this simulated hospital with AI agents as both patients and doctors, which allowed the AI doctors to gain experience (treating 10,000 patients) and “evolve” become better. LinkedIn has an agent that helps recruit job seekers (and also an AI jobs match tool). Zoom has also given its AI companion some agency capability.

ROBOTS: There are new robots and robots platforms coming out every day, but start with the ridable hydrogen-powered Corleo from Kawasaki

You can find a complete list of AI products (tracked by Ithaka S+R) here

There is a great AI guide for students.