Help Center

UCRAWLER DOCUMENTATION

How uCrawler works

uCrawler can automatically parse articles from any media website URLs (hereinafter "sources"), structure data in JSON and deliver it straight to your website, mobile app or analytics system via API (JSON), XML, RSS, using Webhooks or via direct export to your DataBase (MySQL, PostgreSQL, Oracle).

uCrawler AI algorithm groups similar news into threads like Google News does.

uCrawler is completely automated. All you need is to choose news sources (website URLs you want to parse).

Data structure

uCrawler combines all data parsed from different sources into a structured, unified view for your convenience.

"domain" — article source domain name
"api_url" — API URL to GET this article from uCrawler API
"html" — article text with HTML tags
"text" — article text without HTML tags
"lang" — article language
"url" — article original URL
"iframes" — iframes from the article
"pub_time" —UTC timestamp when articles was parsed by uCrawler
"meta_images" — images from article META tags
"score" — internal uCrawler popularity score (We calculate it only among collected articles)
"title" — article Title
"images" — array with all images that uCrawler was able to parse from the article. uCrawler ignore images less then 300x300px

"url" — image original URL
"caption" — text under the image
"length" — size of the image (bytes)
"format" — image format
"width" — image width
"height" — image height

"sources" — array with information about source and group IDs, Names for this article

"group" — group ID that contains source of this article
"group_name" — group Name that contains source of this article
"source" — source ID of this article
"source_name" — source Name of this article

"icons" — array of all icons from articles page
"video" — array with all videos that uCrawler was able to parse from the article
"id" — unique article ID in uCrawler

UCRAWLER HELP CENTER

Account Tutorial

1

Dashboard

Add media website URLs you want to parse or choose from our library to get latest or historical news articles.

You just need to create a Group e.g. "Sport news" and add page URLs. uCrawler automatically collects all news from the webpage.

3

Queries

uCrawler powerful search query capabilities let you slice and dice the data according to your needs.

Our search queries are super fast and you can preview the result immediately with visual Preview or cURL sample.

1

Sources

Choose group or specific source that will be included in your query.

2

Start Date and End Date (format: YYYY-MM-DD)

Choose a specific time frame for the collected content. If you want to increase the time frame (default is 30 days), send a request to public@ucrawler.app

3

Keywords

Receive accurate search results by adding keywords that may appear in article title or article text.

4

Article text truncation

Truncate an article text at the specified number of words. By default we return in the query full article text.

5

Format

The output format: JSON or XML.

6

Size

The total number of posts returned per request ranges between 1 to 200 (default is 100). You can increase this value by request.

4

RSS feeds

Setup RSS feeds based on your sources and keywords. Then just add your customized news to your website using news RSS widgets.

1

Sources

Choose a group or specific source that will be included in your RSS.

2

Days

Choose a specific time frame for the collected content. By default news can be collected for the last 1 to 7 days in RSS. Please send us a request to public@ucrawler.app if you want to increase this value.

3

Keywords

Receive accurate search results by adding keywords that may appear in article title or article text.

4

Article text truncation

Truncate an article text at the specified number of words. By default we return in the query full article text.

5

Size

The total number of posts returned per request ranges between 1 to 200 (default is 100). You can increase this value by request.

6

RSS title

Add a name for your RSS feed.

6

Export data

We can connect your ElasticSearch, PostgreSQL, My SQL, MariaDB, Oracle, Microsoft SQL Server, Mongo DB or Webhook for automatic data export.

Please contact us public@ucrawler.app to setup this feature.

8

API Docs

uCrawler API documentation is available for all customers. You can create Groups and Sources, work with search Queries via API.

Please contact us public@ucrawler.app if you need help with API.

UP