-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Add Scrape Autopilot components #21238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
scrappilot
wants to merge
4
commits into
PipedreamHQ:master
Choose a base branch
from
scrappilot:add-scrape-autopilot
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+329
−0
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
984d09f
Add Scrape Autopilot components
scrappilot 6f0a633
Address Scrape Autopilot component guidelines
scrappilot 3bc17c0
Add documentation links to Scrape Autopilot actions
scrappilot e0b8211
Merge branch 'master' into add-scrape-autopilot
ashwins01 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| # Overview | ||
|
|
||
| Scrape Autopilot provides cost-efficient web scraping for public URLs with Markdown, HTML, and plain text outputs. Use it to extract clean page content from websites and feed the results into Pipedream workflows, AI steps, databases, alerts, and reporting pipelines. | ||
|
|
||
| Authenticate with your Scrape Autopilot API key from https://www.scrappilot.com/dashboard. | ||
|
|
||
| # Example Use Cases | ||
|
|
||
| - **AI-ready content extraction**: Scrape a URL as Markdown, then send the clean content to an AI step for summarization, classification, or entity extraction. | ||
| - **Batch website monitoring**: Scrape a short list of URLs on a schedule and compare the returned text or Markdown against previous runs. | ||
| - **Lead and research workflows**: Extract readable page content from company websites, product pages, or public articles, then store structured results in Airtable, Google Sheets, or a database. |
25 changes: 25 additions & 0 deletions
25
components/scrape_autopilot/actions/get-balance/get-balance.mjs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| import scrapeAutopilot from "../../scrape_autopilot.app.mjs"; | ||
|
|
||
| export default { | ||
| name: "Get Balance", | ||
| description: "Check your Scrape Autopilot credit balance to keep cost-efficient scraping workflows under control. [See the documentation](https://www.scrappilot.com/docs)", | ||
| key: "scrape_autopilot-get-balance", | ||
| version: "0.0.1", | ||
| annotations: { | ||
| destructiveHint: false, | ||
| openWorldHint: true, | ||
| readOnlyHint: true, | ||
| }, | ||
| type: "action", | ||
| props: { | ||
| scrapeAutopilot, | ||
| }, | ||
| async run({ $ }) { | ||
| const data = await this.scrapeAutopilot.getBalance({ | ||
| $, | ||
| }); | ||
|
|
||
| $.export("$summary", `Credit balance: ${data.credits}`); | ||
| return data; | ||
| }, | ||
| }; |
45 changes: 45 additions & 0 deletions
45
components/scrape_autopilot/actions/scrape-url/scrape-url.mjs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| import scrapeAutopilot from "../../scrape_autopilot.app.mjs"; | ||
|
|
||
| export default { | ||
| name: "Scrape URL", | ||
| description: "Cost-efficiently scrape one public URL and return Markdown, HTML, or text. [See the documentation](https://www.scrappilot.com/docs)", | ||
| key: "scrape_autopilot-scrape-url", | ||
| version: "0.0.1", | ||
| annotations: { | ||
| destructiveHint: false, | ||
| openWorldHint: true, | ||
| readOnlyHint: false, | ||
| }, | ||
| type: "action", | ||
| props: { | ||
| scrapeAutopilot, | ||
| url: { | ||
| type: "string", | ||
| label: "URL", | ||
| description: "The fully qualified public URL to scrape.", | ||
| }, | ||
| format: { | ||
| propDefinition: [ | ||
| scrapeAutopilot, | ||
| "format", | ||
| ], | ||
| }, | ||
| js: { | ||
| propDefinition: [ | ||
| scrapeAutopilot, | ||
| "js", | ||
| ], | ||
| }, | ||
| }, | ||
| async run({ $ }) { | ||
| const data = await this.scrapeAutopilot.scrapeUrl({ | ||
| $, | ||
| url: this.url, | ||
| format: this.format, | ||
| js: this.js, | ||
| }); | ||
|
|
||
| $.export("$summary", `Scraped ${this.url}`); | ||
| return data; | ||
| }, | ||
| }; |
65 changes: 65 additions & 0 deletions
65
components/scrape_autopilot/actions/scrape-urls/scrape-urls.mjs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| import { ConfigurationError } from "@pipedream/platform"; | ||
| import scrapeAutopilot from "../../scrape_autopilot.app.mjs"; | ||
|
|
||
| const MAX_URLS = 10; | ||
|
|
||
| export default { | ||
| name: "Scrape URLs", | ||
| description: "Cost-efficiently scrape up to 10 public URLs and return one result per URL. [See the documentation](https://www.scrappilot.com/docs)", | ||
| key: "scrape_autopilot-scrape-urls", | ||
| version: "0.0.1", | ||
| annotations: { | ||
| destructiveHint: false, | ||
| openWorldHint: true, | ||
| readOnlyHint: false, | ||
| }, | ||
| type: "action", | ||
| props: { | ||
| scrapeAutopilot, | ||
| urls: { | ||
| type: "string[]", | ||
| label: "URLs", | ||
| description: "Public URLs to scrape. Maximum 10.", | ||
| }, | ||
| format: { | ||
| propDefinition: [ | ||
| scrapeAutopilot, | ||
| "format", | ||
| ], | ||
| }, | ||
| js: { | ||
| propDefinition: [ | ||
| scrapeAutopilot, | ||
| "js", | ||
| ], | ||
| }, | ||
| }, | ||
| async run({ $ }) { | ||
| const urls = (this.urls || []).map((url) => url.trim()).filter(Boolean); | ||
|
|
||
| if (!urls.length) { | ||
| throw new ConfigurationError("Provide at least one URL."); | ||
| } | ||
|
|
||
| if (urls.length > MAX_URLS) { | ||
| throw new ConfigurationError( | ||
| `Scrape Autopilot batch scraping is limited to ${MAX_URLS} URLs.`, | ||
| ); | ||
| } | ||
|
|
||
| const data = await this.scrapeAutopilot.scrapeUrls({ | ||
| $, | ||
| urls, | ||
| format: this.format, | ||
| js: this.js, | ||
| }); | ||
|
|
||
| $.export( | ||
| "$summary", | ||
| `Scraped ${urls.length} URL${urls.length === 1 | ||
| ? "" | ||
| : "s"}`, | ||
| ); | ||
| return data; | ||
| }, | ||
| }; | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| { | ||
| "name": "@pipedream/scrape_autopilot", | ||
| "version": "0.0.1", | ||
| "description": "Pipedream Scrape Autopilot Components", | ||
| "main": "scrape_autopilot.app.mjs", | ||
| "keywords": [ | ||
| "pipedream", | ||
| "scrape_autopilot", | ||
| "web-scraping", | ||
| "markdown" | ||
| ], | ||
| "homepage": "https://pipedream.com/apps/scrape_autopilot", | ||
| "author": "Pipedream <support@pipedream.com> (https://pipedream.com/)", | ||
| "publishConfig": { | ||
| "access": "public" | ||
| }, | ||
| "dependencies": { | ||
| "@pipedream/platform": "^3.1.1" | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,163 @@ | ||
| import { axios } from "@pipedream/platform"; | ||
|
|
||
| const FORMATS = [ | ||
| { | ||
| label: "Markdown", | ||
| value: "md", | ||
| }, | ||
| { | ||
| label: "HTML", | ||
| value: "html", | ||
| }, | ||
| { | ||
| label: "Text", | ||
| value: "text", | ||
| }, | ||
| ]; | ||
|
|
||
| export default { | ||
| type: "app", | ||
| app: "scrape_autopilot", | ||
| propDefinitions: { | ||
| scrapeAutopilot: { | ||
| type: "app", | ||
| app: "scrape_autopilot", | ||
| label: "Scrape Autopilot", | ||
| description: "Connect your Scrape Autopilot account.", | ||
| }, | ||
| format: { | ||
| type: "string", | ||
| label: "Output Format", | ||
| description: "The response format to return.", | ||
| options: FORMATS, | ||
| optional: true, | ||
| default: "md", | ||
| }, | ||
| js: { | ||
| type: "boolean", | ||
| label: "Enable JavaScript Rendering", | ||
| description: "Use JavaScript rendering for dynamic pages. This consumes more credits.", | ||
| optional: true, | ||
| default: false, | ||
| }, | ||
| }, | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
| methods: { | ||
| /** | ||
| * Returns the Scrape Autopilot API base URL. | ||
| * | ||
| * @returns {string} Base URL for Scrape Autopilot API requests. | ||
| */ | ||
| _baseUrl() { | ||
| return "https://www.scrappilot.com"; | ||
| }, | ||
| /** | ||
| * Builds authorization headers for Scrape Autopilot API requests. | ||
| * | ||
| * @returns {object} Headers containing the connected account API key. | ||
| */ | ||
| _authHeaders() { | ||
| return { | ||
| Authorization: this.$auth.api_key, | ||
| }; | ||
| }, | ||
| /** | ||
| * Makes an authenticated Scrape Autopilot API request. | ||
| * | ||
| * @param {object} opts - Request options. | ||
| * @param {*} opts.$ - Pipedream execution context. | ||
| * @param {string} opts.path - API path beginning with `/`. | ||
| * @param {object} [opts.headers] - Additional request headers. | ||
| * @returns {Promise<object>} Parsed API response body. | ||
| */ | ||
| async _makeRequest({ | ||
| $, | ||
| path, | ||
| headers, | ||
| ...args | ||
| }) { | ||
| return axios($, { | ||
| ...args, | ||
| baseURL: this._baseUrl(), | ||
| url: path, | ||
| headers: { | ||
| ...this._authHeaders(), | ||
| ...headers, | ||
| }, | ||
| }); | ||
| }, | ||
| /** | ||
| * Scrapes one public URL. | ||
| * | ||
| * @param {object} opts - Scrape request options. | ||
| * @param {*} opts.$ - Pipedream execution context. | ||
| * @param {string} opts.url - Fully qualified public URL to scrape. | ||
| * @param {string} [opts.format] - Output format: `md`, `html`, or `text`. | ||
| * @param {boolean} [opts.js] - Whether to enable JavaScript rendering. | ||
| * @returns {Promise<object>} Scrape result. | ||
| */ | ||
| async scrapeUrl({ | ||
| $, | ||
| url, | ||
| format, | ||
| js, | ||
| }) { | ||
| return this._makeRequest({ | ||
| $, | ||
| method: "POST", | ||
| path: "/api/scrape", | ||
| headers: { | ||
| "Content-Type": "application/json", | ||
| }, | ||
| data: { | ||
| url, | ||
| format, | ||
| js, | ||
| }, | ||
| }); | ||
| }, | ||
| /** | ||
| * Scrapes multiple public URLs. | ||
| * | ||
| * @param {object} opts - Batch scrape request options. | ||
| * @param {*} opts.$ - Pipedream execution context. | ||
| * @param {string[]} opts.urls - Fully qualified public URLs to scrape. | ||
| * @param {string} [opts.format] - Output format: `md`, `html`, or `text`. | ||
| * @param {boolean} [opts.js] - Whether to enable JavaScript rendering. | ||
| * @returns {Promise<object>} Batch scrape result. | ||
| */ | ||
| async scrapeUrls({ | ||
| $, | ||
| urls, | ||
| format, | ||
| js, | ||
| }) { | ||
| return this._makeRequest({ | ||
| $, | ||
| method: "POST", | ||
| path: "/api/scrape", | ||
| headers: { | ||
| "Content-Type": "application/json", | ||
| }, | ||
| data: { | ||
| urls, | ||
| format, | ||
| js, | ||
| }, | ||
| }); | ||
| }, | ||
| /** | ||
| * Gets the remaining credit balance. | ||
| * | ||
| * @param {object} opts - Balance request options. | ||
| * @param {*} opts.$ - Pipedream execution context. | ||
| * @returns {Promise<object>} Account status and credit balance. | ||
| */ | ||
| async getBalance({ $ }) { | ||
| return this._makeRequest({ | ||
| $, | ||
| method: "GET", | ||
| path: "/api/status", | ||
| }); | ||
| }, | ||
| }, | ||
| }; | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.