Back to Agent Skills
PDF Text Extractor
Custom / General FrameworkSkill Description
Extracts text from PDF files, page by page, handling metadata and basic formatting. Ideal for RAG (Retrieval-Augmented Generation) pipelines.
Code / Definition File
skill_manifest.yaml / config.json
from pypdf import PdfReader
def extract_pdf_text(file_path: str) -> str:
"""
Reads a PDF file and returns all text combined.
"""
reader = PdfReader(file_path)
full_text = []
for page in reader.pages:
text = page.extract_text()
if text:
full_text.append(text)
return "\n\n".join(full_text)
Tags
pdf text-extraction rag
Author: AiAnyTool
Added on: 6/7/2026