AiAnyTool - Best AI Tools Directory and Artificial Intelligence Software Hub LogoAIAnyTool
Back to Agent Skills

PDF Text Extractor

Custom / General Framework

Skill Description

Extracts text from PDF files, page by page, handling metadata and basic formatting. Ideal for RAG (Retrieval-Augmented Generation) pipelines.

Code / Definition File

skill_manifest.yaml / config.json
from pypdf import PdfReader

def extract_pdf_text(file_path: str) -> str:
    """
    Reads a PDF file and returns all text combined.
    """
    reader = PdfReader(file_path)
    full_text = []
    for page in reader.pages:
        text = page.extract_text()
        if text:
            full_text.append(text)
    return "\n\n".join(full_text)

Tags

pdf text-extraction rag
Author: AiAnyTool
Added on: 6/7/2026