Back to blog

Posted about 1 month by Jakob Gillich

Backendless Web Scraping Using the Everbase GraphQL API

Welcome to our third update! This time around, we have worked hard to greatly improve our in-schema documentation and we have removed some fields that were deprecated in the last release 3 months ago. We have also added a powerful new feature: web scraping.

Web scraping allows you to fetch and parse websites over a GraphQL interface, no backend required. Here's how you can get the latest posts on Hacker News:

{
  url(url: "https://news.ycombinator.com") {
    htmlDocument {
      title
      body {
        submissions: all(selector: "tr.athing") {
          rank: text(selector: "span.rank")
          text(selector: "a.storylink")
          url: attribute(selector: "a.storylink", name: "href")
          attrs: next {
            score: text(selector: "span.score")
            user: text(selector: "a.hnuser")
            comments: text(selector: "a:nth-of-type(3)")
          }
        }
      }
    }
  }
}

Executing this query returns a result like this:

{
  "data": {
    "url": {
      "htmlDocument": {
        "title": "Hacker News",
        "body": {
          "submissions": [
            {
              "rank": "1.",
              "text": "Anxiety Driven Development",
              "url": "https://andreschweighofer.com/agile/anxiety-in-product-development/",
              "attrs": {
                "score": "106 points",
                "user": "fidrelity",
                "comments": "31 comments"
              }
            },
            {
              "rank": "2.",
              "text": "I'm resigning from my job at Facebook",
              "url": "https://www.facebook.com/timothy.j.aveni/posts/3006224359465567",
              "attrs": {
                "score": "73 points",
                "user": "dredmorbius",
                "comments": "8 comments"
              }
            },
            {
              "rank": "3.",
              "text": "Blur Tools for Signal",
              "url": "https://signal.org/blog/blur-tools/",
              "attrs": {
                "score": "204 points",
                "user": "tosh",
                "comments": "106 comments"
              }
            },
            // ...
          ]
        }
      }
    }
  }
}

This is a very easy way to fetch data from websites that do not have their own APIs.

All Changes

Here is a list of all changes:

✖  Field countryName (deprecated) was removed from object type City
✖  Field capitalName (deprecated) was removed from object type Country
✖  Field A (deprecated) was removed from object type DNSRecords
✖  Field AAAA (deprecated) was removed from object type DNSRecords
✖  Field CNAME (deprecated) was removed from object type DNSRecords
✖  Field MX (deprecated) was removed from object type DNSRecords
⚠  Default value 1 was added to argument amount on field Currency.convert
⚠  Default value 18437736874454810000 was added to argument high on field Random.float
⚠  Default value -1.7976931348623157e+308 was added to argument low on field Random.float
⚠  Default value 2147483647 was added to argument high on field Random.int
⚠  Default value -2147483648 was added to argument low on field Random.int
⚠  Default value 16 was added to argument length on field Random.string
✔  Object type City has description A city is a large human settlement.
✔  Field City.continent has description The continent.
✔  Field City.country has description The country.
✔  Field City.geonamesID has description The Geonames.org ID.
✔  Field City.id has description The Wikidata ID.
✔  Field City.location has description The location.
✔  Field City.name has description The name.
✔  Field City.population has description The population.
✔  Field timeZone was added to object type City
✔  Field timeZoneDST was added to object type City
✔  Object type Client has description Information about the client that sent the request.
✔  Field Client.ipAddress has description The IP address.
✔  Field Client.userAgent has description The user agent.
✔  Object type Continent has description A continent is one of several very large landmasses.
✔  Field Continent.geonamesID has description The Geonames.org ID.
✔  Field Continent.id has description The Wikidata ID.
✔  Field Continent.name has description The name.
✔  Field Continent.population has description The population.
✔  Object type Coordinates has description Geographic coordinates.
✔  Field Coordinates.lat has description Latitude.
✔  Field Coordinates.long has description Longitude
✔  Object type Country has description A sovereign state.
✔  Field Country.alpha2Code has description The ISO 3166-1 alpha-2 code.
✔  Field Country.alpha3Code has description The ISO 3166-1 alpha-3 code.
✔  Field Country.callingCodes has description Calling codes.
✔  Field Country.capital has description The capital city.
✔  Field Country.cities has description All cities of the country.
✔  Field Country.continent has description The continent the country is located in.
✔  Field Country.currencies has description All official currencies of the country.
✔  Field Country.geonamesID has description The Geonames.org ID.
✔  Field Country.id has description The Wikidata ID.
✔  Field Country.languages has description All official languages of the country.
✔  Field Country.location has description The location.
✔  Field Country.name has description The name.
✔  Field Country.population has description The population.
✔  Field vatRate was added to object type Country
✔  Field Currency.countries has description Countries that use the currency.
✔  Field Currency.id has description The Wikidata ID.
✔  Field Currency.isoCode has description The ISO 4217 code.
✔  Field Currency.name has description The name.
✔  Field Currency.unitSymbols has description Unit symbols.
✔  Object type DomainName has description Domain Name of the Domain Name System (DNS).
✔  Field a was added to object type DomainName
✔  Field aaaa was added to object type DomainName
✔  Field cname was added to object type DomainName
✔  Field mx was added to object type DomainName
✔  Field DomainName.name has description The domain name.
✔  Field DomainName.records is deprecated
✔  Field DomainName.records has deprecation reason Use fields on domainName itself
✔  Field EmailAddress.address has description The email address.
✔  Field EmailAddress.domainName has description The host as a domain name.
✔  Field ok was added to object type EmailAddress
✔  Field EmailServiceProvider.domainName has description The domain name.
✔  Field smtpOk was added to object type EmailServiceProvider
✔  Type HTMLDocument was added
✔  Type HTMLNode was added
✔  Description Can be either a IPv4 or a IPv6 address.

This product includes GeoLite2 data created by MaxMind, available from www.maxmind.com. on type IPAddress has changed to Internet Protocol address. Can be either a IPv4 or a IPv6 address.

This product includes GeoLite2 data created by MaxMind, available from www.maxmind.com.
✔  Field IPAddress.address has description The IP address.
✔  Field IPAddress.city has description The city this IP address belongs to.
✔  Field IPAddress.country has description The country this IP address belongs to.
✔  Field IPAddress.type has description The IP address type.
✔  Field Language.alpha2Code has description The ISO 639-1 code.
✔  Field Language.countries has description The countries that use the language.
✔  Field Language.id has description The Wikidata ID.
✔  Field Language.name has description The name.
✔  Field MXRecord.exchange has description The domain name.
✔  Field MXRecord.preference has description The preference value.
✔  Object type Query has description Query is the root object of all queries.
✔  Field Query.client has description Get client info.
✔  Field htmlDocument was added to object type Query
✔  Field timeZones was added to object type Query
✔  Field Random.float has description Generate a float.
✔  Field Random.int has description Generate a integer.
✔  Field Random.string has description Generate a string.
✔  Type TimeZone was added
✔  Type TimeZoneWhere was added
✔  Field URL.domainName has description The host as a domain name.
✔  Field URL.host has description The host.
✔  Field htmlDocument was added to object type URL
✔  Field URL.path has description The path.
✔  Field URL.port has description The port.
✔  Field URL.query has description The query.
✔  Field URL.scheme has description The scheme.
✔  Field URL.url has description The full URL.

As always, you can also fetch and compare the schema in our schema repository.