Building a Better Title-Caser, Part 1: Beyond Python str.title
by Audrey M. Roy Greenfeld | Fri, Feb 14, 2025
Title-casing text is one of those hard problems no one ever gets right, yet no one considers worthy enough to solve with AI. Here I experiment to see if I can improve upon the latest best solutions with a local Ollama modelfile and a solid prompt.
I begin by seeing what str.title does. It's built into Python, so nothing needs to be installed.
"hi".title()
PyPI titlecase
Now with pip install titlecase, I get this titlecase function:
titlecase("hi")
Simple Multi-Word Ccomparison
Both of these functions should do well with a simple test case:
text="the quick brown fox"print(f"title(): {text.title()}")print(f"titlecase(): {titlecase(text)}")
With Apostrophes
## With apostrophestext2="it's a beautiful day in mr. rogers' neighborhood"print(f"\ntitle(): {text2.title()}")print(f"titlecase(): {titlecase(text2)}")
Here titlecase lowercased the articles correctly.
Modern Terms With Unconventional Capitalization
text3="iphone and e-mail tips for pdfs"print(f"\ntitle(): {text3.title()}")print(f"titlecase(): {titlecase(text3)}")
My use case would be to title case voice-dictated text. Here there's something tricky because E-Mail is one of those terms where the hyphenation is debatable and undergoing change. Personally, I prefer email without the hyphen. It's interesting how I voice-dicated this paragraph (Wispr Flow) and it ended up both ways!
My preference for a return value here is iPhone and Email Tips for PDFs. In situations where a hyphenated word is optionally unhyphenated, I'd like the title-casing function to unhyphenate and then title-case it. If that's not possible, my backup preference is iPhone and E-Mail Tips for PDFs.
Using a Hosted LLM as a Title-Caser
deftc_gemini(s):model=genai.GenerativeModel('gemini-1.5-flash-latest')resp=model.generate_content(f"Convert '{s}' to title case, please. Return ONLY the title-cased string.",safety_settings=[],request_options={"timeout":1000})try:returnresp.textexceptExceptionasex:raiseex
tc_gemini(text3)
Gemini 1.5 Flash works decently as a title caster with this simple prompt. I noticed though that the mail and email isn't capitalized. That is one that people find confusing. The rule is when a word is hyphenated, each part of the hyphenated word should be capitalized.
This feels a bit wasteful though with a lot of API calls to a service that will likely cost money in the future. I suppose you'd want to batch them if you went this way. I think it would be a lot nicer though to use a small local LLM for simple tasks like this.
Use Small Local LLMs as Title-Casers
deftc_ollama(s,model='mistral'):# Call ollama with a simple title-case promptresponse=ollama.chat(model=model,messages=[{'role':'user','content':f"Convert '{s}' to title case. Return ONLY the title-cased string with no explanation or quotes."}])returnresponse['message']['content'].strip()
print(tc_ollama(text3))
Mistral is quite good. Let's try others:
# Let's try a few different models to comparemodels=['llama3.2','tinyllama','deepseek-r1:7b','deepseek-coder:33b','qwen2.5:3b']print("\nComparing models:")formodelinmodels:try:print(f"{model:10}: {tc_ollama(text3,model)}")except:print(f"{model:10}: Failed")