Capture the Code.
Extract the Text.
Save the rendered DOM and extract all text from any webpage in a single API call. The most complete evidence capture available.
Why use it?
Screenshots capture the visual. Source extraction captures everything else.
Capture the post-JavaScript DOM as a .html file. Perfect for design theft evidence, SEO meta tag tracking, and legal archiving.
GPT-4o Vision reads every word on the page — titles, headings, paragraphs, buttons, links. Structured JSON output, ready to use.
Use both together and pay only 4 credits instead of 5. The most cost-effective way to capture complete page evidence.
One call, complete capture
Add save_source and extract_text to any screenshot request. Both work in sync and async mode.
- save_source: true — saves rendered DOM as .html to CDN
- extract_text: true — GPT-4o Vision reads all visible text
- SHA-256 hash for cryptographic proof of content
- Combo discount: 4 credits instead of 5 when using both
- Works with async polling and sync mode
{
"url": "https://example.com",
"save_source": true,
"extract_text": true
}
{
"status": "completed",
"image_url": "https://cdn.goscreenapi.com/screenshots/uuid.png",
"source_url": "https://cdn.goscreenapi.com/sources/uuid.html",
"source_hash": "a3f4b2c1d5e6f7...",
"extracted_text": {
"title": "Example Domain",
"headings": ["Example Domain"],
"paragraphs": ["This domain is for use in illustrative examples..."],
"buttons": [],
"links": ["More information..."],
"other": []
},
"credits_used": 4,
"combo_discount": true
}
Built for evidence. Built for intelligence.
From legal archiving to competitive research — source extraction unlocks use cases screenshots alone can't cover.
Design Theft Evidence
Capture the exact HTML at the time of infringement. SHA-256 hash provides cryptographic proof of the original content.
SEO Meta Tag Tracking
Monitor competitor meta titles, descriptions, and structured data changes over time. Catch updates the moment they happen.
Content Archiving
Archive deleted social media posts, changed contract terms, or removed product listings with full text extraction.
Competitor Intelligence
Extract pricing tables, feature lists, and CTAs from competitor pages. Structured JSON makes analysis effortless.
Ready to capture everything?
250 free screenshots/month. No credit card required.
Extracting Rendered HTML
Our source code extractor delivers fully rendered HTML by executing JavaScript in a headless browser environment. With Headless Chromium, it waits for network idle and utilizes custom selectors, ensuring a complete DOM extraction. This approach handles dynamic elements, including shadow DOM. Consequently, developers gain accurate insights from complex web pages.
Response times average between 500ms to 2000ms depending on page complexity. File sizes for returned HTML hover around 300KB to 1MB. The system supports extensive formats including JSON, XML, and TXT. Notably, the page source API handles up to 10 concurrent requests per second, efficient for on-the-fly data processing.
How It Works
Curious about the under-the-hood operations? The headless browser HTML rendering relies on Chromium's robust architecture. It processes each request, rendering JavaScript before returning the source. As a result, you receive the JavaScript rendered source rather than a static snapshot. This ensures you capture active page states.
The process involves identifying network idle before extraction. Custom selectors pinpoint elements, even within shadow roots. This meticulous DOM extraction includes all visible and hidden elements. The engine provides complete page context, crucial for accurate data gathering from SPAs and dynamic sites.
Use Cases
- Web Scraping: Capture dynamic data from SPAs.
- Archiving: Preserve fully interactive web pages.
- Content Analysis: Analyze updated page content in real-time.
- SEO Monitoring: Ensure JavaScript content matches SEO standards.
- Security Auditing: Examine loaded scripts and resources.
Benefits for Developers
Specific advantages make this tool a developer's ally. Scraping SPAs becomes straightforward with rendered HTML extraction. Developers save significant time previously spent on manual adjustments. Additionally, the system supports batch processing to optimize workflow rates.
The source code extractor ensures all JavaScript executed content is accurately captured. You gain actionable insights without extensive setup. This tool transforms how you handle dynamic content, facilitating tasks like content monitoring and competitive analysis.
DOM Extraction: Why It Matters
Why focus on comprehensive DOM extraction? Accurate DOM data captures the actual user experience. For instance, interactive elements generated by JavaScript are essential for precise content analysis and archiving. Moreover, having a complete DOM, including shadow DOM content, enriches the data set with contextual information.
By leveraging headless browser HTML, you can emulate actual browsing conditions. This functionality is indispensable for applications needing in-depth web interaction insights. The page source API enhances your toolkit, offering capabilities beyond basic HTML parsing.
Ready to revolutionize your data extraction process? Implement our source code extractor today.
Frequently Asked Questions
How can I extract source code from a webpage using your API?
Our API allows you to retrieve the source code of a webpage in plain text format by specifying the URL in your request. There are no size limits on the source code extraction, and you can expect a response time of around 2-5 seconds, depending on the complexity of the page.
Can I get rendered HTML from your API?
Yes, you can obtain rendered HTML by using our screenshot endpoint with the 'rendered_html' parameter set to true. The output will include all dynamically generated content, reflecting the state of the DOM after JavaScript execution.
Does your API execute JavaScript before taking a screenshot?
Yes, our API utilizes a headless browser that fully executes JavaScript on the page, ensuring that the final screenshot captures all interactive elements. This process typically takes an additional 1-3 seconds, depending on the site's complexity.
What is the headless browser DOM in your API?
The headless browser DOM refers to the Document Object Model generated by our API's underlying headless browser after executing all scripts on the page. This allows for accurate rendering of the webpage, ensuring that screenshots reflect the content as seen by users in real-time.