PDF Text Extractor

Custom / General Framework

Skill Description

Extracts text from PDF files, page by page, handling metadata and basic formatting. Ideal for RAG (Retrieval-Augmented Generation) pipelines.

skill_manifest.yaml / config.json

from pypdf import PdfReader

def extract_pdf_text(file_path: str) -> str:
    """
    Reads a PDF file and returns all text combined.
    """
    reader = PdfReader(file_path)
    full_text = []
    for page in reader.pages:
        text = page.extract_text()
        if text:
            full_text.append(text)
    return "\n\n".join(full_text)

PDF Text Extractor

Skill Description

Tags