CSRF¶
The same origin policy does not prevent cross-origin requests from being sent. It only prevents you from processing/inspecting the response. So?
If you're logged into bank.ch and bank.ch doesn't "properly" handle requests --
i.e., it only checks the cookie to make sure you're authenticated -- then
evil.com can send a request to, say, transfer funds out of your account. For
example, the attacker can send a POST request via an HTML <form>
element or
just use the XHR/fetch JavaScript APIs.
In general POST requests are considered the most general because they inherently modify state server-side. But, some web sites take query parameters from a GET request and modify state server-side too (e.g., you can do forms this way just as well), and an attacker can just as easily abuse this.
Defenses¶
Goal: server wants to make sure request is coming from same origin (or from an origin it trusts).
CSRF tokens¶
The most common way to deal with this is to create random token and include them in your forms as hidden values. Server-side, you can now just check that the request body (after praising) contains the right token.
- What happens if you forget to add token to form field?
- What happens if you forget to check tokens server-side?
- What do you do about GET requests?
- What's the trade-off? (How does this play with CDNs?)
Referrer and origin headers¶
Browser sends referrer header to server to indicate what the URL of the page that made the request is. Full URL is not great for privacy and many organizations filter referrer headers.
Origin header just includes the origin of the page. Mostly just for POST requests. Largely designed to deal with CSRF. Support for origin header not done in all browsers.
- What's the trade-off when comparsed to tokens?
XSS¶
XSS is a way of injecting scripts that execute client-side unintentionally.
Stored XXS¶
A classical example of an XSS attack is a forum that renders HTML-sylized user
comments. If the forum is not "properly" implemented -- i.e., if the server
does not properly filter data -- a malcious user can upload a comment that
contains a <script>
element. Any other user that then views the page that
contains the comment will then be pwned: the malicious script will run in the
context of the victim web app. So, for example, the script can steal the user's
cookie, leak data, etc.
This is called a stored XSS attack because the script is stored in the form database and happily shipped to the client (browser) by the forum web app.
Reflected XSS¶
This is not the only vector though. For example, this link
https://duckduckgo.com/?q=xss when you click on it will not only navigate you
to http://duckduckgo.com but also populate the search bar. If they didn't
implement sanitization properly an attacker can craft a link that includes
code. For example, they can set the query string to
"><script>alert('pwn')</script
to close the <input value="...>
field and
add a script element. (Duckduckgo actually handles this so this won't work if
you try it.)
This XSS attack requires the victim to click on a link (or type in a URL, submit GET form, etc.). This is called a reflected XSS attacks, it takes advantage of the fact that the web site reflects what's in the URL on the page.
Defenses¶
Sanitization¶
The typical approach is to consider all user data as untrusted and then santize it by:
- Encoding unsafe strings (e.g. HTML tags)
- Filtering unsafe elements (e.g.,
<sript>
<a href="javascript://...">
) - Rejecting strings that aren't explicitly safe patterns.
This is generally not super easy:
- Encoding functions can be tricked.
- Different encodings can make it super hard to actually filter unsafe elements.
- Regular expressions are also hard to get right.
You often trade off what kind of content users are allow to present with
security (e.g., how do you write a blog post about XSS if any mention of
<script>
is disallowed?).
HTML is generally hard to think about: different browsers parse things differently -- many are permissive to avoid breaking real pages. DOMPurify is a good way to sanitize content because it actually uses browser APIs to safely render the untrusted content.
iframe sandbox¶
The HTML iframe tag now has a sandbox
attribute that can be used to sandbox
untrusted content. For example, you can put content in an iframe sandbox where
no script execution is allowed.
CSP¶
We'll talk about CSP more next week, but the short of it is: it lets you disable inline-scripts, whitelist scripts a page is allowed to execute, and whitelist trusted origins. (It really does way more than this, probably too much.)
SQLi and other injections¶
You don't need to only worry about sanitizing JS that will be sent to the client. Data is stored in databases, often by creating string queries.
When you concatenating untrusted strings, the meaning of the query may change. For example, in Node.js, this is bad:
...
const user = req.query.user;
const query = `SELECT * FROM messages WHERE name = '${user}'`;
...
db.query(query);
Why? Attacker can end statement and insert their own code:
What can you do with this?
In general this is happening because of a mismatch:
- the web application treats user input as data
- database parser treats user input as code
Other ways to inject¶
There are many other places where user data can cause harm.
-
Executing external binaries: when you call out to
system()
you want to make sure that the user data can't adversely affect the executable, arguments, etc. -
Reading/writing files: when you handle file uploads or read/write files on behalf of user, don't trust the filename.
Defenses¶
For SQLi you can sanitize user input, but this is generally no longer the recommend approach. Prepared statements are largely the norm:
db.query({ text: 'SELECT * FROM messages WHERE name = '$1'
, values: [user] });
The prepare text
says which statement to execute and where the parameters are
$1
. The values are provided separately and encoded by the database (not app).
We sometimes still need to sanitize:
-
if the data makes it back to client we need to sanitize/filter JavaScript.
-
APIs for reading/writing files don't always have a prepared statment-like interface.