Generating Unlimited Stories With Ai Llm On Your Computer¶
We will be building an AI Story Generator with FastAPI, Raw SQL, and Ollama
Prerequisites and Setup Guide¶
Before we begin, you'll need:
- Basic Python knowledge (functions, async/await, error handling)
- Basic understanding of HTTP requests and REST APIs
- Familiarity with SQL fundamentals
- A code editor (VS Code recommended)
- Python 3.8+
- Ollama installed on your machine
Introduction: What Are We Building?¶
We're creating an AI-powered story generator web service that:
- Accepts story prompts from users
- Generates creative stories using LLM APIs
- Stores and retrieves stories using raw SQL
- Serves results via FastAPI endpoints
This project teaches valuable skills in:
- Building modern async web APIs
- Working with databases without ORMs
- Integrating AI services
- Structuring larger applications
Project Structure¶
ai-story/
├── app/
│ ├── __init__.py
│ ├── config.py
│ ├── database.py
│ ├── local_llm.py
│ └── main.py
├── .env
├── requirements.txt
└── stories.sqlite3
Installations¶
$ mkdir ai-story
$ cd ai-story
# create virtual environment using venv module(-m) and name it as venv-myproejct
$ python3.11 -m venv venv
$ source venv/bin/activate
# windows users can do like following
# $ .\venv\Scripts\activate
Required packages:
Introduction to Ollama¶
Ollama is a framework that helps you run large language models (LLMs) locally on your machine. Instead of relying on cloud-based services like OpenAI, you can run models like Llama 2 directly on your computer. - Home: https://ollama.com/ - Download Ollama: https://ollama.com/download - Ollama LLM Models: https://ollama.com/library
Key Ollama Commands:¶
# Install Ollama (Mac/Linux)
curl -fsSL https://ollama.com/install.sh | sh
# List available models
ollama list
# Pull a model (example: Llama 3)
ollama pull llama3.2:1b
# Start Ollama server
ollama serve
# Run a model in CLI
ollama run llama3.2:1b
The Ollama API runs on http://localhost:11434 by default.
Part 1: Environment Configuration¶
- Create a
.env
file:
Step 1.1: Configuration Setup¶
- Create config.py:
Why This Matters:¶
- Separates configuration from code
- Makes it easy to switch between different LLM models
- Allows for development/production environment differences
Step 1.2: Database Setup¶
- Create database.py:
import aiosqlite
DB_NAME = "stories.sqlite3"
TABLE_NAME = "t_stories"
# SQL Queries as constants for better maintenance
qs_select_all = f"SELECT id, prompt, content, generation_duration FROM {TABLE_NAME} "
qs_select_one = f"SELECT id, prompt, content, generation_duration FROM {TABLE_NAME} WHERE id = ?"
qs_delete_one = f"DELETE FROM {TABLE_NAME} WHERE id = ?"
qs_insert_one = f"INSERT INTO {TABLE_NAME} (prompt, content, generation_duration) VALUES (?, ?, ?) RETURNING id, prompt, content, generation_duration"
async def init_db():
async with aiosqlite.connect(DB_NAME) as db:
await db.execute(
f"""
CREATE TABLE IF NOT EXISTS {TABLE_NAME} (
id INTEGER PRIMARY KEY AUTOINCREMENT,
prompt TEXT NOT NULL,
content TEXT NOT NULL,
generation_duration REAL NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
"""
)
await db.commit()
async def get_db():
db = await aiosqlite.connect(DB_NAME)
try:
yield db
finally:
await db.close()
Key Improvements:¶
- Using constants for SQL queries improves maintainability
- Added generation_duration to track performance
- Consistent table naming with prefix (t_stories)
Understanding get_db and Dependencies¶
-
This might look simple, but there's a lot of magic happening here! Let me break it down:
-
First, we create an async function that connects to our database
- We use yield instead of return - this makes it a "dependency generator"
- The try/finally block ensures our database connection always gets closed, even if there's an error
- Think of get_db like a responsible librarian:
- When you need a book (database connection), they get it for you
- They keep track of who has what book
- When you're done, they make sure the book gets put back properly
-
Now, here's how we use it in our routes:
-
See that Depends(get_db) part? It's telling FastAPI:
"Hey, before running this function, I need a database connection. Use the get_db function to get one. When the function is done, clean up the connection"
Cool, right? This way, we never have to worry about managing database connections manually!
Part 2: Local LLM Integration¶
- Create local_llm.py:
import httpx
from app.config import settings
OLLAMA_API_URL = settings.OLLAMA_API_URL or "http://localhost:11434/api/generate"
LLM_MODEL = settings.LLM_MODEL or "llama3.2:1b"
async def generate_story(prompt: str) -> str:
async with httpx.AsyncClient() as client:
try:
response = await client.post(
OLLAMA_API_URL,
json={
"prompt": prompt,
"model": LLM_MODEL,
"stream": False
},
headers={"Content-Type": "application/json"},
timeout=300.0,
)
response.raise_for_status()
res = response.json()
return res.get("response")
except httpx.HTTPStatusError as he:
print(f"HTTP Error: {he.response.status_code} - {he.response.text}")
raise Exception(f"Http Error: {str(he)}")
except Exception as e:
raise Exception(f"Story generation failed: {str(e)}")
Important Points:¶
- Using httpx for async HTTP requests
- Extended timeout for local model inference
- Proper error handling for HTTP and general errors
- Non-streaming mode for simpler implementation
Part 3: FastAPI Application¶
- Create main.py:
import time from aiosqlite import Connection from fastapi import FastAPI, Depends, HTTPException from pydantic import BaseModel from app.database import ( get_db, init_db, qs_select_all, qs_select_one, qs_insert_one, qs_delete_one, ) from app.local_llm import generate_story app = FastAPI(title="AI Story Generator") @app.on_event("startup") async def startup_event(): await init_db() class StoryPrompt(BaseModel): prompt: str class Story(BaseModel): id: int prompt: str content: str generation_duration: float @app.get("/stories/") async def get_stories(db: Connection = Depends(get_db)) -> list[Story]: pass # will add .. @app.post("/stories/", response_model=Story) async def create_story(story_prompt: StoryPrompt, db: Connection = Depends(get_db)): pass # ... @app.get("/stories/{story_id}", response_model=Story) async def get_story(story_id: int, db: Connection = Depends(get_db)): pass @app.delete("/stories/{story_id}", status_code=200) async def delete(story_id: int, db: Connection = Depends(get_db)): pass
Application Startup and Models¶
Let's look at how our application initializes and handles data:
The Startup Event¶
- This is like our application's morning routine - it runs once when the server starts up. In this case, it makes sure our database table exists and is ready to go. Think of it as setting up your workspace before starting your day!Data Models¶
class StoryPrompt(BaseModel):
prompt: str
class Story(BaseModel):
id: int
prompt: str
content: str
generation_duration: float
These Pydantic models are like our data's security guards:
- StoryPrompt checks that incoming story requests have a prompt
- Story defines what a complete story looks like in our system
- They automatically validate data and provide clear error messages if something's wrong
Understanding the Routes¶
Now, let's dive into each of our routes and see what they do. This is where all the magic happens!
Create Story Route¶
@app.post("/stories/", response_model=Story)
async def create_story(story_prompt: StoryPrompt, db: Connection = Depends(get_db)):
try:
# Time the story generation
start = time.perf_counter()
content = await generate_story(story_prompt.prompt)
generation_duration = time.perf_counter() - start
# Store in database
async with db.execute(
qs_insert_one,
(story_prompt.prompt, content, generation_duration),
) as cursor:
row = await cursor.fetchone()
await db.commit()
return {
"id": row[0],
"prompt": row[1],
"content": row[2],
"generation_duration": row[3],
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
This is our most complex route - it's like a story factory! Let's break it down:
- We accept a story prompt from the user
- Start a timer to measure generation time
- Generate the story using our local LLM
- Calculate how long it took
- Save everything to the database
- Return the complete story with its metadata
Get All Stories Route¶
@app.get("/stories/")
async def get_stories(db: Connection = Depends(get_db)) -> list[Story]:
try:
async with db.execute(qs_select_all) as cursor:
rows = await cursor.fetchall()
return [
{
"id": row[0],
"prompt": row[1],
"content": row[2],
"generation_duration": row[3],
}
for row in rows
]
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
This route is like a librarian giving you a catalog of all stories. Here's what's happening:
- We get a database connection using our get_db dependency
- db.execute() runs our SELECT query
- cursor.fetchall() gets all the results
- We transform the raw database rows into a nice list of dictionaries
- If anything goes wrong, we return a 500 error
Get Single Story Route¶
@app.get("/stories/{story_id}", response_model=Story)
async def get_story(story_id: int, db: Connection = Depends(get_db)):
async with db.execute(qs_select_one, (story_id,)) as cursor:
row = await cursor.fetchone()
if not row:
raise HTTPException(status_code=404, detail="Story not found")
return {
"id": row[0],
"prompt": row[1],
"content": row[2],
"generation_duration": row[3],
}
Think of this route as looking up a specific book by its ID. It:
- Takes a story ID from the URL
- Looks up that specific story in the database
- Returns a 404 if it doesn't exist
- Returns the story if it does
Delete Story Route¶
@app.delete("/stories/{story_id}", status_code=200)
async def delete(story_id: int, db: Connection = Depends(get_db)):
try:
async with db.execute(qs_delete_one, (story_id,)):
await db.commit()
if db.total_changes > 0:
return {"message": f"{story_id} deleted successfully"}
raise HTTPException(status_code=400, detail="Error while delete operation.")
except Exception as e:
print(e)
raise HTTPException(status_code=404, detail="Story not found")
- Tries to delete the specified story
- Checks if anything was actually deleted
- Returns a success message if it worked
- Returns an error if the story didn't exist
Understanding Database Operations¶
Let's look at some common database operations you'll see in the code:
db.execute()
: Runs a SQL querycursor.fetchone()
: Gets one result rowcursor.fetchall()
: Gets all result rowsdb.commit()
: Saves changes to the database
Think of these like:
- execute() is writing your question
- fetchone()/fetchall() is getting the answer(s)
- commit() is making sure your changes are saved
Remember: Always use async with when executing database operations - it's like making sure you clean up after yourself in the kitchen!
Running The Application¶
- Start Ollama server:
ollama serve
- Pull your preferred model:
ollama pull llama3.2:1b
- Start the FastAPI server:
uvicorn app.main:app --reload
- Test with curl:
# Create a story curl -X POST http://localhost:8000/stories/ \ -H "Content-Type: application/json" \ -d '{"prompt": "Generate a story for a 3 years old boy who loves repairing. Teach ethical morals with the story. Maximum 1 paragraph 16 sentences will be generated."}' # List all stories curl http://localhost:8000/stories/ # Get specific story curl http://localhost:8000/stories/1 # Delete a story curl -X DELETE http://localhost:8000/stories/1
Example stories¶
[
{
"id": 1,
"prompt": "generate 1 paragraph story",
"content": "As the sun set over the vast desert landscape, a lone figure emerged from the sand dunes. Kael, a skilled stargazer, had been tracking a rare celestial event for weeks - a triple-solar alignment that only occurred once in a lifetime. He made his way through the endless dunes, his worn boots sinking into the fine grains as he scanned the horizon for any sign of the phenomenon. As the stars began to twinkle like diamonds against the dark canvas above, Kael spotted it - a shimmering band of light that seemed to pulse with an otherworldly energy. With trembling hands, he raised his binoculars to get a closer look, and as he did, he felt the air vibrate with an electric anticipation, as if the very fabric of space itself was about to reveal its deepest secrets.",
"generation_duration": 3.041568958084099
},
{
"id": 2,
"prompt": "Generate a story for a 3 years old boy who loves repairing. Teach ethical morals with the story. Maximum 1 paragraph 16 sentences will be generated.",
"content": "There was a little robot named Zip, who loved to repair things. He would fix toys and gadgets, making them work again. One day, Zip saw some toys that were broken and had trash inside. He didn't want those toys to hurt anyone or the earth. So, he cleaned out the trash and put it in a special bin. When his friend, a boy named Max, came to him for help fixing something, Zip said, \"I'll fix it, but first, let's make sure we clean up.\" They picked up all the trash and put it in a big bin. Then, they fixed Max's toy car. It was shiny and new again! Max was happy, and he thanked Zip for his hard work. But then, Zip saw another toy that needed fixing too - one with paint that had fallen off. He said, \"Max, let me fix this first, so it can look nice.\" Max helped him clean the paint and fix the toy. Then they put the toys back on the shelf, and everyone was happy. They learned an important lesson: taking care of the world is like fixing a broken toy - we need to be kind and help keep everything working well.",
"generation_duration": 13.030234124977142
}
]
Common Pitfalls and Solutions¶
Ollama Server Issues¶
Problem: "Connection refused" errors Solution: Ensure Ollama server is running (ollama serve)
Slow Generation Times¶
Problem: Story generation takes too long Solution:
- Use a smaller model (instead of 7b we used 1b you can use smaller)
- Increase system resources allocated to Ollama
- Consider GPU acceleration if available
Memory Issues¶
Problem: System runs out of memory Solution:
- Use smaller models
- Limit concurrent requests
- Monitor system resources
Database Locks¶
Problem: "Database is locked" errors Solution: Proper async connection handling (implemented in code
Quick Tips and Reminders¶
As you work with this code, keep in mind:
- Always handle database connections properly (our dependency system does this for us)
- Remember to commit changes when modifying data
- Use try/except blocks to handle errors gracefully
- Keep track of your query parameters to prevent SQL injection
Want to experiment? Try adding some new features! Maybe:
- Add a rating system for stories