Generating Unlimited Stories With Ai Llm On Your Computer¶

We will be building an AI Story Generator with FastAPI, Raw SQL, and Ollama

Prerequisites and Setup Guide¶

Before we begin, you'll need:

Basic Python knowledge (functions, async/await, error handling)
Basic understanding of HTTP requests and REST APIs
Familiarity with SQL fundamentals
A code editor (VS Code recommended)
Python 3.8+
Ollama installed on your machine

Introduction: What Are We Building?¶

We're creating an AI-powered story generator web service that:

Accepts story prompts from users
Generates creative stories using LLM APIs
Stores and retrieves stories using raw SQL
Serves results via FastAPI endpoints

This project teaches valuable skills in:

Building modern async web APIs
Working with databases without ORMs
Integrating AI services
Structuring larger applications

Project Structure¶

ai-story/
├── app/
│   ├── __init__.py
│   ├── config.py
│   ├── database.py
│   ├── local_llm.py
│   └── main.py
├── .env
├── requirements.txt
└── stories.sqlite3

Installations¶

$ mkdir ai-story
$ cd ai-story
# create virtual environment using venv module(-m) and name it as venv-myproejct
$ python3.11 -m venv venv
$ source venv/bin/activate
# windows users can do like following
# $ .\venv\Scripts\activate

Required packages:

pip install fastapi uvicorn python-dotenv aiosqlite httpx

Introduction to Ollama¶

Ollama is a framework that helps you run large language models (LLMs) locally on your machine. Instead of relying on cloud-based services like OpenAI, you can run models like Llama 2 directly on your computer. - Home: https://ollama.com/ - Download Ollama: https://ollama.com/download - Ollama LLM Models: https://ollama.com/library

Key Ollama Commands:¶

# Install Ollama (Mac/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# List available models
ollama list

# Pull a model (example: Llama 3)
ollama pull llama3.2:1b

# Start Ollama server
ollama serve

# Run a model in CLI
ollama run llama3.2:1b

The Ollama API runs on http://localhost:11434 by default.

Part 1: Environment Configuration¶

Create a .env file:

DATABASE_URL=sqlite:///./stories.sqlite3
LLM_MODEL=llama3.2:1b
OLLAMA_API_URL=http://localhost:11434/api/generate

Step 1.1: Configuration Setup¶

Create config.py:

from dotenv import load_dotenv
import os

load_dotenv()

class Settings:
    DATABASE_URL = os.getenv("DATABASE_URL")
    LLM_MODEL = os.getenv("LLM_MODEL")
    OLLAMA_API_URL = os.getenv("OLLAMA_API_URL")

settings = Settings()

Why This Matters:¶

Separates configuration from code
Makes it easy to switch between different LLM models
Allows for development/production environment differences

Step 1.2: Database Setup¶

Create database.py:

import aiosqlite

DB_NAME = "stories.sqlite3"
TABLE_NAME = "t_stories"

# SQL Queries as constants for better maintenance
qs_select_all = f"SELECT id, prompt, content, generation_duration FROM {TABLE_NAME} "
qs_select_one = f"SELECT id, prompt, content, generation_duration FROM {TABLE_NAME} WHERE id = ?"
qs_delete_one = f"DELETE FROM {TABLE_NAME} WHERE id = ?"
qs_insert_one = f"INSERT INTO {TABLE_NAME} (prompt, content, generation_duration) VALUES (?, ?, ?) RETURNING id, prompt, content, generation_duration"

async def init_db():
    async with aiosqlite.connect(DB_NAME) as db:
        await db.execute(
            f"""
            CREATE TABLE IF NOT EXISTS {TABLE_NAME} (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                prompt TEXT NOT NULL,
                content TEXT NOT NULL,
                generation_duration REAL NOT NULL,
                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
            )
        """
        )
        await db.commit()

async def get_db():
    db = await aiosqlite.connect(DB_NAME)
    try:
        yield db
    finally:
        await db.close()

Key Improvements:¶

Using constants for SQL queries improves maintainability
Added generation_duration to track performance
Consistent table naming with prefix (t_stories)

Understanding get_db and Dependencies¶

This might look simple, but there's a lot of magic happening here! Let me break it down:
First, we create an async function that connects to our database
We use yield instead of return - this makes it a "dependency generator"
The try/finally block ensures our database connection always gets closed, even if there's an error
Think of get_db like a responsible librarian:
- When you need a book (database connection), they get it for you
- They keep track of who has what book
- When you're done, they make sure the book gets put back properly

Now, here's how we use it in our routes:

@app.get("/stories/")
async def get_stories(db: Connection = Depends(get_db)) -> list[Story]:

See that Depends(get_db) part? It's telling FastAPI:

"Hey, before running this function, I need a database connection. Use the get_db function to get one. When the function is done, clean up the connection"

Cool, right? This way, we never have to worry about managing database connections manually!

Part 2: Local LLM Integration¶

Create local_llm.py:

import httpx
from app.config import settings

OLLAMA_API_URL = settings.OLLAMA_API_URL or "http://localhost:11434/api/generate"
LLM_MODEL = settings.LLM_MODEL or "llama3.2:1b"

async def generate_story(prompt: str) -> str:
    async with httpx.AsyncClient() as client:
        try:
            response = await client.post(
                OLLAMA_API_URL,
                json={
                    "prompt": prompt,
                    "model": LLM_MODEL,
                    "stream": False
                },
                headers={"Content-Type": "application/json"},
                timeout=300.0,
            )
            response.raise_for_status()
            res = response.json()
            return res.get("response")
        except httpx.HTTPStatusError as he:
            print(f"HTTP Error: {he.response.status_code} - {he.response.text}")
            raise Exception(f"Http Error: {str(he)}")
        except Exception as e:
            raise Exception(f"Story generation failed: {str(e)}")

Important Points:¶

Using httpx for async HTTP requests
Extended timeout for local model inference
Proper error handling for HTTP and general errors
Non-streaming mode for simpler implementation

Part 3: FastAPI Application¶

Create main.py:

import time
from aiosqlite import Connection
from fastapi import FastAPI, Depends, HTTPException
from pydantic import BaseModel

from app.database import (
    get_db,
    init_db,
    qs_select_all,
    qs_select_one,
    qs_insert_one,
    qs_delete_one,
)
from app.local_llm import generate_story

app = FastAPI(title="AI Story Generator")

@app.on_event("startup")
async def startup_event():
    await init_db()

class StoryPrompt(BaseModel):
    prompt: str

class Story(BaseModel):
    id: int
    prompt: str
    content: str
    generation_duration: float

@app.get("/stories/")
async def get_stories(db: Connection = Depends(get_db)) -> list[Story]:
    pass
    # will add ..

@app.post("/stories/", response_model=Story)
async def create_story(story_prompt: StoryPrompt, db: Connection = Depends(get_db)):
    pass
    # ...

@app.get("/stories/{story_id}", response_model=Story)
async def get_story(story_id: int, db: Connection = Depends(get_db)):
    pass

@app.delete("/stories/{story_id}", status_code=200)
async def delete(story_id: int, db: Connection = Depends(get_db)):
    pass

Application Startup and Models¶

Let's look at how our application initializes and handles data:

The Startup Event¶

@app.on_event("startup")
async def startup_event():
    await init_db()

- This is like our application's morning routine - it runs once when the server starts up. In this case, it makes sure our database table exists and is ready to go. Think of it as setting up your workspace before starting your day!

Data Models¶

class StoryPrompt(BaseModel):
    prompt: str

class Story(BaseModel):
    id: int
    prompt: str
    content: str
    generation_duration: float

These Pydantic models are like our data's security guards:

StoryPrompt checks that incoming story requests have a prompt
Story defines what a complete story looks like in our system
They automatically validate data and provide clear error messages if something's wrong

Understanding the Routes¶

Now, let's dive into each of our routes and see what they do. This is where all the magic happens!

Create Story Route¶

@app.post("/stories/", response_model=Story)
async def create_story(story_prompt: StoryPrompt, db: Connection = Depends(get_db)):
    try:
        # Time the story generation
        start = time.perf_counter()
        content = await generate_story(story_prompt.prompt)
        generation_duration = time.perf_counter() - start

        # Store in database
        async with db.execute(
            qs_insert_one,
            (story_prompt.prompt, content, generation_duration),
        ) as cursor:
            row = await cursor.fetchone()
            await db.commit()

        return {
            "id": row[0],
            "prompt": row[1],
            "content": row[2],
            "generation_duration": row[3],
        }
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

This is our most complex route - it's like a story factory! Let's break it down:

We accept a story prompt from the user
Start a timer to measure generation time
Generate the story using our local LLM
Calculate how long it took
Save everything to the database
Return the complete story with its metadata

Get All Stories Route¶

@app.get("/stories/")
async def get_stories(db: Connection = Depends(get_db)) -> list[Story]:
    try:
        async with db.execute(qs_select_all) as cursor:
            rows = await cursor.fetchall()
            return [
                {
                    "id": row[0],
                    "prompt": row[1],
                    "content": row[2],
                    "generation_duration": row[3],
                }
                for row in rows
            ]
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

This route is like a librarian giving you a catalog of all stories. Here's what's happening:

We get a database connection using our get_db dependency
db.execute() runs our SELECT query
cursor.fetchall() gets all the results
We transform the raw database rows into a nice list of dictionaries
If anything goes wrong, we return a 500 error

Get Single Story Route¶

@app.get("/stories/{story_id}", response_model=Story)
async def get_story(story_id: int, db: Connection = Depends(get_db)):
    async with db.execute(qs_select_one, (story_id,)) as cursor:
        row = await cursor.fetchone()

    if not row:
        raise HTTPException(status_code=404, detail="Story not found")

    return {
        "id": row[0],
        "prompt": row[1],
        "content": row[2],
        "generation_duration": row[3],
    }

Think of this route as looking up a specific book by its ID. It:

Takes a story ID from the URL
Looks up that specific story in the database
Returns a 404 if it doesn't exist
Returns the story if it does

Delete Story Route¶

@app.delete("/stories/{story_id}", status_code=200)
async def delete(story_id: int, db: Connection = Depends(get_db)):
    try:
        async with db.execute(qs_delete_one, (story_id,)):
            await db.commit()
            if db.total_changes > 0:
                return {"message": f"{story_id} deleted successfully"}
            raise HTTPException(status_code=400, detail="Error while delete operation.")
    except Exception as e:
        print(e)
        raise HTTPException(status_code=404, detail="Story not found")

This route is our cleanup crew! It:

Tries to delete the specified story
Checks if anything was actually deleted
Returns a success message if it worked
Returns an error if the story didn't exist

Understanding Database Operations¶

Let's look at some common database operations you'll see in the code:

db.execute(): Runs a SQL query
cursor.fetchone(): Gets one result row
cursor.fetchall(): Gets all result rows
db.commit(): Saves changes to the database

Think of these like:

execute() is writing your question
fetchone()/fetchall() is getting the answer(s)
commit() is making sure your changes are saved

Remember: Always use async with when executing database operations - it's like making sure you clean up after yourself in the kitchen!

Running The Application¶

Start Ollama server: ollama serve
Pull your preferred model: ollama pull llama3.2:1b
Start the FastAPI server: uvicorn app.main:app --reload

Test with curl:

# Create a story
curl -X POST http://localhost:8000/stories/ \
     -H "Content-Type: application/json" \
     -d '{"prompt": "Generate a story for a 3 years old boy who loves repairing. Teach ethical morals with the story. Maximum 1 paragraph 16 sentences will be generated."}'

# List all stories
curl http://localhost:8000/stories/

# Get specific story
curl http://localhost:8000/stories/1

# Delete a story
curl -X DELETE http://localhost:8000/stories/1

Example stories¶

[
  {
    "id": 1,
    "prompt": "generate 1 paragraph story",
    "content": "As the sun set over the vast desert landscape, a lone figure emerged from the sand dunes. Kael, a skilled stargazer, had been tracking a rare celestial event for weeks - a triple-solar alignment that only occurred once in a lifetime. He made his way through the endless dunes, his worn boots sinking into the fine grains as he scanned the horizon for any sign of the phenomenon. As the stars began to twinkle like diamonds against the dark canvas above, Kael spotted it - a shimmering band of light that seemed to pulse with an otherworldly energy. With trembling hands, he raised his binoculars to get a closer look, and as he did, he felt the air vibrate with an electric anticipation, as if the very fabric of space itself was about to reveal its deepest secrets.",
    "generation_duration": 3.041568958084099
  },
  {
    "id": 2,
    "prompt": "Generate a story for a 3 years old boy who loves repairing. Teach ethical morals with the story. Maximum 1 paragraph 16 sentences will be generated.",
    "content": "There was a little robot named Zip, who loved to repair things. He would fix toys and gadgets, making them work again. One day, Zip saw some toys that were broken and had trash inside. He didn't want those toys to hurt anyone or the earth. So, he cleaned out the trash and put it in a special bin. When his friend, a boy named Max, came to him for help fixing something, Zip said, \"I'll fix it, but first, let's make sure we clean up.\" They picked up all the trash and put it in a big bin. Then, they fixed Max's toy car. It was shiny and new again! Max was happy, and he thanked Zip for his hard work. But then, Zip saw another toy that needed fixing too - one with paint that had fallen off. He said, \"Max, let me fix this first, so it can look nice.\" Max helped him clean the paint and fix the toy. Then they put the toys back on the shelf, and everyone was happy. They learned an important lesson: taking care of the world is like fixing a broken toy - we need to be kind and help keep everything working well.",
    "generation_duration": 13.030234124977142
  }
]

Common Pitfalls and Solutions¶

Ollama Server Issues¶

Problem: "Connection refused" errors Solution: Ensure Ollama server is running (ollama serve)

Slow Generation Times¶

Problem: Story generation takes too long Solution:

Use a smaller model (instead of 7b we used 1b you can use smaller)
Increase system resources allocated to Ollama
Consider GPU acceleration if available

Memory Issues¶

Problem: System runs out of memory Solution:

Use smaller models
Limit concurrent requests
Monitor system resources

Database Locks¶

Problem: "Database is locked" errors Solution: Proper async connection handling (implemented in code

Quick Tips and Reminders¶

As you work with this code, keep in mind:

Always handle database connections properly (our dependency system does this for us)
Remember to commit changes when modifying data
Use try/except blocks to handle errors gracefully
Keep track of your query parameters to prevent SQL injection

Want to experiment? Try adding some new features! Maybe:

Add a rating system for stories