Skip to content

Generating Unlimited Stories With Ai Llm On Your Computer

We will be building an AI Story Generator with FastAPI, Raw SQL, and Ollama

Prerequisites and Setup Guide

Before we begin, you'll need:

  • Basic Python knowledge (functions, async/await, error handling)
  • Basic understanding of HTTP requests and REST APIs
  • Familiarity with SQL fundamentals
  • A code editor (VS Code recommended)
  • Python 3.8+
  • Ollama installed on your machine

Introduction: What Are We Building?

We're creating an AI-powered story generator web service that:

  1. Accepts story prompts from users
  2. Generates creative stories using LLM APIs
  3. Stores and retrieves stories using raw SQL
  4. Serves results via FastAPI endpoints

This project teaches valuable skills in:

  • Building modern async web APIs
  • Working with databases without ORMs
  • Integrating AI services
  • Structuring larger applications

Project Structure

ai-story/
├── app/
   ├── __init__.py
   ├── config.py
   ├── database.py
   ├── local_llm.py
   └── main.py
├── .env
├── requirements.txt
└── stories.sqlite3

Installations

$ mkdir ai-story
$ cd ai-story
# create virtual environment using venv module(-m) and name it as venv-myproejct
$ python3.11 -m venv venv
$ source venv/bin/activate
# windows users can do like following
# $ .\venv\Scripts\activate

Required packages:

pip install fastapi uvicorn python-dotenv aiosqlite httpx

Introduction to Ollama

Ollama is a framework that helps you run large language models (LLMs) locally on your machine. Instead of relying on cloud-based services like OpenAI, you can run models like Llama 2 directly on your computer. - Home: https://ollama.com/ - Download Ollama: https://ollama.com/download - Ollama LLM Models: https://ollama.com/library

Key Ollama Commands:

# Install Ollama (Mac/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# List available models
ollama list

# Pull a model (example: Llama 3)
ollama pull llama3.2:1b

# Start Ollama server
ollama serve

# Run a model in CLI
ollama run llama3.2:1b

The Ollama API runs on http://localhost:11434 by default.

Part 1: Environment Configuration

  • Create a .env file:
    DATABASE_URL=sqlite:///./stories.sqlite3
    LLM_MODEL=llama3.2:1b
    OLLAMA_API_URL=http://localhost:11434/api/generate
    

Step 1.1: Configuration Setup

  • Create config.py:
    from dotenv import load_dotenv
    import os
    
    load_dotenv()
    
    class Settings:
        DATABASE_URL = os.getenv("DATABASE_URL")
        LLM_MODEL = os.getenv("LLM_MODEL")
        OLLAMA_API_URL = os.getenv("OLLAMA_API_URL")
    
    settings = Settings()
    

Why This Matters:

  • Separates configuration from code
  • Makes it easy to switch between different LLM models
  • Allows for development/production environment differences

Step 1.2: Database Setup

  • Create database.py:
import aiosqlite

DB_NAME = "stories.sqlite3"
TABLE_NAME = "t_stories"

# SQL Queries as constants for better maintenance
qs_select_all = f"SELECT id, prompt, content, generation_duration FROM {TABLE_NAME} "
qs_select_one = f"SELECT id, prompt, content, generation_duration FROM {TABLE_NAME} WHERE id = ?"
qs_delete_one = f"DELETE FROM {TABLE_NAME} WHERE id = ?"
qs_insert_one = f"INSERT INTO {TABLE_NAME} (prompt, content, generation_duration) VALUES (?, ?, ?) RETURNING id, prompt, content, generation_duration"

async def init_db():
    async with aiosqlite.connect(DB_NAME) as db:
        await db.execute(
            f"""
            CREATE TABLE IF NOT EXISTS {TABLE_NAME} (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                prompt TEXT NOT NULL,
                content TEXT NOT NULL,
                generation_duration REAL NOT NULL,
                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
            )
        """
        )
        await db.commit()

async def get_db():
    db = await aiosqlite.connect(DB_NAME)
    try:
        yield db
    finally:
        await db.close()

Key Improvements:

  • Using constants for SQL queries improves maintainability
  • Added generation_duration to track performance
  • Consistent table naming with prefix (t_stories)

Understanding get_db and Dependencies

  • This might look simple, but there's a lot of magic happening here! Let me break it down:

  • First, we create an async function that connects to our database

  • We use yield instead of return - this makes it a "dependency generator"
  • The try/finally block ensures our database connection always gets closed, even if there's an error
  • Think of get_db like a responsible librarian:
    • When you need a book (database connection), they get it for you
    • They keep track of who has what book
    • When you're done, they make sure the book gets put back properly
  • Now, here's how we use it in our routes:

    @app.get("/stories/")
    async def get_stories(db: Connection = Depends(get_db)) -> list[Story]:
    

  • See that Depends(get_db) part? It's telling FastAPI:

    "Hey, before running this function, I need a database connection. Use the get_db function to get one. When the function is done, clean up the connection"

Cool, right? This way, we never have to worry about managing database connections manually!

Part 2: Local LLM Integration

  • Create local_llm.py:
import httpx
from app.config import settings

OLLAMA_API_URL = settings.OLLAMA_API_URL or "http://localhost:11434/api/generate"
LLM_MODEL = settings.LLM_MODEL or "llama3.2:1b"

async def generate_story(prompt: str) -> str:
    async with httpx.AsyncClient() as client:
        try:
            response = await client.post(
                OLLAMA_API_URL,
                json={
                    "prompt": prompt,
                    "model": LLM_MODEL,
                    "stream": False
                },
                headers={"Content-Type": "application/json"},
                timeout=300.0,
            )
            response.raise_for_status()
            res = response.json()
            return res.get("response")
        except httpx.HTTPStatusError as he:
            print(f"HTTP Error: {he.response.status_code} - {he.response.text}")
            raise Exception(f"Http Error: {str(he)}")
        except Exception as e:
            raise Exception(f"Story generation failed: {str(e)}")

Important Points:

  • Using httpx for async HTTP requests
  • Extended timeout for local model inference
  • Proper error handling for HTTP and general errors
  • Non-streaming mode for simpler implementation

Part 3: FastAPI Application

  • Create main.py:
    import time
    from aiosqlite import Connection
    from fastapi import FastAPI, Depends, HTTPException
    from pydantic import BaseModel
    
    from app.database import (
        get_db,
        init_db,
        qs_select_all,
        qs_select_one,
        qs_insert_one,
        qs_delete_one,
    )
    from app.local_llm import generate_story
    
    app = FastAPI(title="AI Story Generator")
    
    @app.on_event("startup")
    async def startup_event():
        await init_db()
    
    class StoryPrompt(BaseModel):
        prompt: str
    
    class Story(BaseModel):
        id: int
        prompt: str
        content: str
        generation_duration: float
    
    @app.get("/stories/")
    async def get_stories(db: Connection = Depends(get_db)) -> list[Story]:
        pass
        # will add ..
    
    @app.post("/stories/", response_model=Story)
    async def create_story(story_prompt: StoryPrompt, db: Connection = Depends(get_db)):
        pass
        # ...
    
    @app.get("/stories/{story_id}", response_model=Story)
    async def get_story(story_id: int, db: Connection = Depends(get_db)):
        pass
    
    @app.delete("/stories/{story_id}", status_code=200)
    async def delete(story_id: int, db: Connection = Depends(get_db)):
        pass
    

Application Startup and Models

Let's look at how our application initializes and handles data:

The Startup Event

@app.on_event("startup")
async def startup_event():
    await init_db()
- This is like our application's morning routine - it runs once when the server starts up. In this case, it makes sure our database table exists and is ready to go. Think of it as setting up your workspace before starting your day!

Data Models

class StoryPrompt(BaseModel):
    prompt: str

class Story(BaseModel):
    id: int
    prompt: str
    content: str
    generation_duration: float

These Pydantic models are like our data's security guards:

  • StoryPrompt checks that incoming story requests have a prompt
  • Story defines what a complete story looks like in our system
  • They automatically validate data and provide clear error messages if something's wrong

Understanding the Routes

Now, let's dive into each of our routes and see what they do. This is where all the magic happens!

Create Story Route

@app.post("/stories/", response_model=Story)
async def create_story(story_prompt: StoryPrompt, db: Connection = Depends(get_db)):
    try:
        # Time the story generation
        start = time.perf_counter()
        content = await generate_story(story_prompt.prompt)
        generation_duration = time.perf_counter() - start

        # Store in database
        async with db.execute(
            qs_insert_one,
            (story_prompt.prompt, content, generation_duration),
        ) as cursor:
            row = await cursor.fetchone()
            await db.commit()

        return {
            "id": row[0],
            "prompt": row[1],
            "content": row[2],
            "generation_duration": row[3],
        }
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

This is our most complex route - it's like a story factory! Let's break it down:

  • We accept a story prompt from the user
  • Start a timer to measure generation time
  • Generate the story using our local LLM
  • Calculate how long it took
  • Save everything to the database
  • Return the complete story with its metadata

Get All Stories Route

@app.get("/stories/")
async def get_stories(db: Connection = Depends(get_db)) -> list[Story]:
    try:
        async with db.execute(qs_select_all) as cursor:
            rows = await cursor.fetchall()
            return [
                {
                    "id": row[0],
                    "prompt": row[1],
                    "content": row[2],
                    "generation_duration": row[3],
                }
                for row in rows
            ]
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

This route is like a librarian giving you a catalog of all stories. Here's what's happening:

  • We get a database connection using our get_db dependency
  • db.execute() runs our SELECT query
  • cursor.fetchall() gets all the results
  • We transform the raw database rows into a nice list of dictionaries
  • If anything goes wrong, we return a 500 error

Get Single Story Route

@app.get("/stories/{story_id}", response_model=Story)
async def get_story(story_id: int, db: Connection = Depends(get_db)):
    async with db.execute(qs_select_one, (story_id,)) as cursor:
        row = await cursor.fetchone()

    if not row:
        raise HTTPException(status_code=404, detail="Story not found")

    return {
        "id": row[0],
        "prompt": row[1],
        "content": row[2],
        "generation_duration": row[3],
    }

Think of this route as looking up a specific book by its ID. It:

  • Takes a story ID from the URL
  • Looks up that specific story in the database
  • Returns a 404 if it doesn't exist
  • Returns the story if it does

Delete Story Route

@app.delete("/stories/{story_id}", status_code=200)
async def delete(story_id: int, db: Connection = Depends(get_db)):
    try:
        async with db.execute(qs_delete_one, (story_id,)):
            await db.commit()
            if db.total_changes > 0:
                return {"message": f"{story_id} deleted successfully"}
            raise HTTPException(status_code=400, detail="Error while delete operation.")
    except Exception as e:
        print(e)
        raise HTTPException(status_code=404, detail="Story not found")
This route is our cleanup crew! It:

  • Tries to delete the specified story
  • Checks if anything was actually deleted
  • Returns a success message if it worked
  • Returns an error if the story didn't exist

Understanding Database Operations

Let's look at some common database operations you'll see in the code:

  • db.execute(): Runs a SQL query
  • cursor.fetchone(): Gets one result row
  • cursor.fetchall(): Gets all result rows
  • db.commit(): Saves changes to the database

Think of these like:

  • execute() is writing your question
  • fetchone()/fetchall() is getting the answer(s)
  • commit() is making sure your changes are saved

Remember: Always use async with when executing database operations - it's like making sure you clean up after yourself in the kitchen!

Running The Application

  1. Start Ollama server: ollama serve
  2. Pull your preferred model: ollama pull llama3.2:1b
  3. Start the FastAPI server: uvicorn app.main:app --reload
  4. Test with curl:
    # Create a story
    curl -X POST http://localhost:8000/stories/ \
         -H "Content-Type: application/json" \
         -d '{"prompt": "Generate a story for a 3 years old boy who loves repairing. Teach ethical morals with the story. Maximum 1 paragraph 16 sentences will be generated."}'
    
    # List all stories
    curl http://localhost:8000/stories/
    
    # Get specific story
    curl http://localhost:8000/stories/1
    
    # Delete a story
    curl -X DELETE http://localhost:8000/stories/1
    

Example stories

[
  {
    "id": 1,
    "prompt": "generate 1 paragraph story",
    "content": "As the sun set over the vast desert landscape, a lone figure emerged from the sand dunes. Kael, a skilled stargazer, had been tracking a rare celestial event for weeks - a triple-solar alignment that only occurred once in a lifetime. He made his way through the endless dunes, his worn boots sinking into the fine grains as he scanned the horizon for any sign of the phenomenon. As the stars began to twinkle like diamonds against the dark canvas above, Kael spotted it - a shimmering band of light that seemed to pulse with an otherworldly energy. With trembling hands, he raised his binoculars to get a closer look, and as he did, he felt the air vibrate with an electric anticipation, as if the very fabric of space itself was about to reveal its deepest secrets.",
    "generation_duration": 3.041568958084099
  },
  {
    "id": 2,
    "prompt": "Generate a story for a 3 years old boy who loves repairing. Teach ethical morals with the story. Maximum 1 paragraph 16 sentences will be generated.",
    "content": "There was a little robot named Zip, who loved to repair things. He would fix toys and gadgets, making them work again. One day, Zip saw some toys that were broken and had trash inside. He didn't want those toys to hurt anyone or the earth. So, he cleaned out the trash and put it in a special bin. When his friend, a boy named Max, came to him for help fixing something, Zip said, \"I'll fix it, but first, let's make sure we clean up.\" They picked up all the trash and put it in a big bin. Then, they fixed Max's toy car. It was shiny and new again! Max was happy, and he thanked Zip for his hard work. But then, Zip saw another toy that needed fixing too - one with paint that had fallen off. He said, \"Max, let me fix this first, so it can look nice.\" Max helped him clean the paint and fix the toy. Then they put the toys back on the shelf, and everyone was happy. They learned an important lesson: taking care of the world is like fixing a broken toy - we need to be kind and help keep everything working well.",
    "generation_duration": 13.030234124977142
  }
]

Common Pitfalls and Solutions

Ollama Server Issues

Problem: "Connection refused" errors Solution: Ensure Ollama server is running (ollama serve)

Slow Generation Times

Problem: Story generation takes too long Solution:

  • Use a smaller model (instead of 7b we used 1b you can use smaller)
  • Increase system resources allocated to Ollama
  • Consider GPU acceleration if available

Memory Issues

Problem: System runs out of memory Solution:

  • Use smaller models
  • Limit concurrent requests
  • Monitor system resources

Database Locks

Problem: "Database is locked" errors Solution: Proper async connection handling (implemented in code

Quick Tips and Reminders

As you work with this code, keep in mind:

  • Always handle database connections properly (our dependency system does this for us)
  • Remember to commit changes when modifying data
  • Use try/except blocks to handle errors gracefully
  • Keep track of your query parameters to prevent SQL injection

Want to experiment? Try adding some new features! Maybe:

  • Add a rating system for stories