Building a Better Title-Caser, Part 1: Beyond Python str.title

by Audrey M. Roy Greenfeld | Fri, Feb 14, 2025

Title-casing text is one of those hard problems no one ever gets right, yet no one considers worthy enough to solve with AI. Here I experiment to see if I can improve upon the latest best solutions with a local Ollama modelfile and a solid prompt.


Setup

import google.generativeai as genai
import ollama
from titlecase import titlecase

Built-In title

I begin by seeing what str.title does. It's built into Python, so nothing needs to be installed.

"hi".title()

PyPI titlecase

Now with pip install titlecase, I get this titlecase function:

titlecase("hi")

Simple Multi-Word Ccomparison

Both of these functions should do well with a simple test case:

text = "the quick brown fox"
print(f"title():    {text.title()}")
print(f"titlecase(): {titlecase(text)}")

With Apostrophes

## With apostrophes
text2 = "it's a beautiful day in mr. rogers' neighborhood"
print(f"\ntitle():    {text2.title()}")
print(f"titlecase(): {titlecase(text2)}")

Here titlecase lowercased the articles correctly.

Modern Terms With Unconventional Capitalization

text3 = "iphone and e-mail tips for pdfs"
print(f"\ntitle():    {text3.title()}")
print(f"titlecase(): {titlecase(text3)}")

My use case would be to title case voice-dictated text. Here there's something tricky because E-Mail is one of those terms where the hyphenation is debatable and undergoing change. Personally, I prefer email without the hyphen. It's interesting how I voice-dicated this paragraph (Wispr Flow) and it ended up both ways!

My preference for a return value here is iPhone and Email Tips for PDFs. In situations where a hyphenated word is optionally unhyphenated, I'd like the title-casing function to unhyphenate and then title-case it. If that's not possible, my backup preference is iPhone and E-Mail Tips for PDFs.

Using a Hosted LLM as a Title-Caser

def tc_gemini(s):
    model = genai.GenerativeModel('gemini-1.5-flash-latest')
    resp = model.generate_content(f"Convert '{s}' to title case, please. Return ONLY the title-cased string.", safety_settings=[], request_options={"timeout": 1000})
    try:
        return resp.text
    except Exception as ex:
        raise ex
tc_gemini(text3)

Gemini 1.5 Flash works decently as a title caster with this simple prompt. I noticed though that the mail and email isn't capitalized. That is one that people find confusing. The rule is when a word is hyphenated, each part of the hyphenated word should be capitalized.

This feels a bit wasteful though with a lot of API calls to a service that will likely cost money in the future. I suppose you'd want to batch them if you went this way. I think it would be a lot nicer though to use a small local LLM for simple tasks like this.

Use Small Local LLMs as Title-Casers

def tc_ollama(s, model='mistral'):
    # Call ollama with a simple title-case prompt
    response = ollama.chat(model=model, messages=[{
        'role': 'user',
        'content': f"Convert '{s}' to title case. Return ONLY the title-cased string with no explanation or quotes."
    }])
    return response['message']['content'].strip()
print(tc_ollama(text3))

Mistral is quite good. Let's try others:

# Let's try a few different models to compare
models = ['llama3.2', 'tinyllama', 'deepseek-r1:7b', 'deepseek-coder:33b', 'qwen2.5:3b']
print("\nComparing models:")
for model in models:
    try:
        print(f"{model:10}: {tc_ollama(text3, model)}")
    except:
        print(f"{model:10}: Failed")