Back to blog

Posted 11 months by Jakob Gillich

Backendless Web Scraping Using the Everbase GraphQL API

Welcome to our third update! This time around, we have worked hard to greatly improve our in-schema documentation and we have removed some fields that were deprecated in the last release 3 months ago. We have also added a powerful new feature: web scraping.

Web scraping allows you to fetch and parse websites over a GraphQL interface, no backend required. Here's how you can get the latest posts on Hacker News:

{
  url(url: "https://news.ycombinator.com") {
    htmlDocument {
      title
      body {
        submissions: all(selector: "tr.athing") {
          rank: text(selector: "span.rank")
          text(selector: "a.storylink")
          url: attribute(selector: "a.storylink", name: "href")
          attrs: next {
            score: text(selector: "span.score")
            user: text(selector: "a.hnuser")
            comments: text(selector: "a:nth-of-type(3)")
          }
        }
      }
    }
  }
}

Executing this query returns a result like this:

{
  "data": {
    "url": {
      "htmlDocument": {
        "title": "Hacker News",
        "body": {
          "submissions": [
            {
              "rank": "1.",
              "text": "Anxiety Driven Development",
              "url": "https://andreschweighofer.com/agile/anxiety-in-product-development/",
              "attrs": {
                "score": "106 points",
                "user": "fidrelity",
                "comments": "31 comments"
              }
            },
            {
              "rank": "2.",
              "text": "I'm resigning from my job at Facebook",
              "url": "https://www.facebook.com/timothy.j.aveni/posts/3006224359465567",
              "attrs": {
                "score": "73 points",
                "user": "dredmorbius",
                "comments": "8 comments"
              }
            },
            {
              "rank": "3.",
              "text": "Blur Tools for Signal",
              "url": "https://signal.org/blog/blur-tools/",
              "attrs": {
                "score": "204 points",
                "user": "tosh",
                "comments": "106 comments"
              }
            },
            // ...
          ]
        }
      }
    }
  }
}

This is a very easy way to fetch data from websites that do not have their own APIs.

All Changes

Here is a list of all changes:

✖  Field countryName (deprecated) was removed from object type City
✖  Field capitalName (deprecated) was removed from object type Country
✖  Field A (deprecated) was removed from object type DNSRecords
✖  Field AAAA (deprecated) was removed from object type DNSRecords
✖  Field CNAME (deprecated) was removed from object type DNSRecords
✖  Field MX (deprecated) was removed from object type DNSRecords
⚠  Default value 1 was added to argument amount on field Currency.convert
⚠  Default value 18437736874454810000 was added to argument high on field Random.float
⚠  Default value -1.7976931348623157e+308 was added to argument low on field Random.float
⚠  Default value 2147483647 was added to argument high on field Random.int
⚠  Default value -2147483648 was added to argument low on field Random.int
⚠  Default value 16 was added to argument length on field Random.string
✔  Object type City has description A city is a large human settlement.
✔  Field City.continent has description The continent.
✔  Field City.country has description The country.
✔  Field City.geonamesID has description The Geonames.org ID.
✔  Field City.id has description The Wikidata ID.
✔  Field City.location has description The location.
✔  Field City.name has description The name.
✔  Field City.population has description The population.
✔  Field timeZone was added to object type City
✔  Field timeZoneDST was added to object type City
✔  Object type Client has description Information about the client that sent the request.
✔  Field Client.ipAddress has description The IP address.
✔  Field Client.userAgent has description The user agent.
✔  Object type Continent has description A continent is one of several very large landmasses.
✔  Field Continent.geonamesID has description The Geonames.org ID.
✔  Field Continent.id has description The Wikidata ID.
✔  Field Continent.name has description The name.
✔  Field Continent.population has description The population.
✔  Object type Coordinates has description Geographic coordinates.
✔  Field Coordinates.lat has description Latitude.
✔  Field Coordinates.long has description Longitude
✔  Object type Country has description A sovereign state.
✔  Field Country.alpha2Code has description The ISO 3166-1 alpha-2 code.
✔  Field Country.alpha3Code has description The ISO 3166-1 alpha-3 code.
✔  Field Country.callingCodes has description Calling codes.
✔  Field Country.capital has description The capital city.
✔  Field Country.cities has description All cities of the country.
✔  Field Country.continent has description The continent the country is located in.
✔  Field Country.currencies has description All official currencies of the country.
✔  Field Country.geonamesID has description The Geonames.org ID.
✔  Field Country.id has description The Wikidata ID.
✔  Field Country.languages has description All official languages of the country.
✔  Field Country.location has description The location.
✔  Field Country.name has description The name.
✔  Field Country.population has description The population.
✔  Field vatRate was added to object type Country
✔  Field Currency.countries has description Countries that use the currency.
✔  Field Currency.id has description The Wikidata ID.
✔  Field Currency.isoCode has description The ISO 4217 code.
✔  Field Currency.name has description The name.
✔  Field Currency.unitSymbols has description Unit symbols.
✔  Object type DomainName has description Domain Name of the Domain Name System (DNS).
✔  Field a was added to object type DomainName
✔  Field aaaa was added to object type DomainName
✔  Field cname was added to object type DomainName
✔  Field mx was added to object type DomainName
✔  Field DomainName.name has description The domain name.
✔  Field DomainName.records is deprecated
✔  Field DomainName.records has deprecation reason Use fields on domainName itself
✔  Field EmailAddress.address has description The email address.
✔  Field EmailAddress.domainName has description The host as a domain name.
✔  Field ok was added to object type EmailAddress
✔  Field EmailServiceProvider.domainName has description The domain name.
✔  Field smtpOk was added to object type EmailServiceProvider
✔  Type HTMLDocument was added
✔  Type HTMLNode was added
✔  Description Can be either a IPv4 or a IPv6 address.

This product includes GeoLite2 data created by MaxMind, available from www.maxmind.com. on type IPAddress has changed to Internet Protocol address. Can be either a IPv4 or a IPv6 address.

This product includes GeoLite2 data created by MaxMind, available from www.maxmind.com.
✔  Field IPAddress.address has description The IP address.
✔  Field IPAddress.city has description The city this IP address belongs to.
✔  Field IPAddress.country has description The country this IP address belongs to.
✔  Field IPAddress.type has description The IP address type.
✔  Field Language.alpha2Code has description The ISO 639-1 code.
✔  Field Language.countries has description The countries that use the language.
✔  Field Language.id has description The Wikidata ID.
✔  Field Language.name has description The name.
✔  Field MXRecord.exchange has description The domain name.
✔  Field MXRecord.preference has description The preference value.
✔  Object type Query has description Query is the root object of all queries.
✔  Field Query.client has description Get client info.
✔  Field htmlDocument was added to object type Query
✔  Field timeZones was added to object type Query
✔  Field Random.float has description Generate a float.
✔  Field Random.int has description Generate a integer.
✔  Field Random.string has description Generate a string.
✔  Type TimeZone was added
✔  Type TimeZoneWhere was added
✔  Field URL.domainName has description The host as a domain name.
✔  Field URL.host has description The host.
✔  Field htmlDocument was added to object type URL
✔  Field URL.path has description The path.
✔  Field URL.port has description The port.
✔  Field URL.query has description The query.
✔  Field URL.scheme has description The scheme.
✔  Field URL.url has description The full URL.

As always, you can also fetch and compare the schema in our schema repository.