select c1, c2, c3 from t1, t2, t3 where c1=c2 and c1=c3With an index on tables t2 and t3 this query can be executed by one scan of table t1.
Doing direct accesses to each row of a table is still much slower than scanning the table sequentially. Reason: for each row, one block must be read from disk. When scanning, many rows are processed per read of a block.
It is possible to use data from the index directly, instead of using the index to guide a read from the table. Consider the query
select author, zip from postings, address where postings.author = address.nameNow we are taking the zip value from the table address, for each author in postings. Suppose we have an index on (name,zip). Then the additional values can be taken straight from the index. This can be a make-or-break optimization.
Remember that an index on multiple columns, e.g. (state, city, zip), is also an index on its prefixes, i.e. (state, city) and (state).
Indexes are only useful for <, <=, =, >=, > comparisons on columns of the same type in where clauses, such that the indexed column is not part of any larger expression. For example, an index on c2 is not useful for the query
select c1, c2 from t1, t2 where c1 = c2*2When indexing a text column, make the index be on a string prefix only. For example to make an index on the first ten characters only of an author's name, use the command alter table postings add index ax (author(10))
Indexes are less useful for values that are repeated, most useful for
unique values.
mysql> explain select distinct ticker from postings where date>'2001-04-23';Here's how an index is used for an equijoin:+----------+-------+---------------+------+---------+------+-------+-------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+----------+-------+---------------+------+---------+------+-------+-------+
| postings | range | date | date | NULL | NULL | 11321 | |
+----------+-------+---------------+------+---------+------+-------+-------+
mysql> explain select distinct ticker from postings, sorted where postings.id=sorted.k;And here's how both are used together:+----------+--------+---------------+---------+---------+----------+--------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+----------+--------+---------------+---------+---------+----------+--------+-------------+
| sorted | index | k | k | 4 | NULL | 124113 | Using index |
| postings | eq_ref | PRIMARY | PRIMARY | 4 | sorted.k | 1 | |
+----------+--------+---------------+---------+---------+----------+--------+-------------+
mysql> explain select distinct ticker from postings, sorted
where postings.id=sorted.k and date>'2001-04-23' order by ticker;+----------+-------+---------------+------+---------+-------------+-------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+----------+-------+---------------+------+---------+-------------+-------+-------------+
| postings | range | PRIMARY,date | date | NULL | NULL | 11321 | |
| sorted | ref | k | k | 4 | postings.id | 10 | Using index |
+----------+-------+---------------+------+---------+-------------+-------+-------------+
The same query can be ten times faster when you run it a second time,
because the tables have been loaded into the server's in-memory cache.
The most important guideline is to avoid redundancy. Consider this schema:
postings: first name, last name, address, zip, title, bodyTo reduce redundancy, we should have two tables:
postings: first name, last name, title, body
person: first name, last name, address, zip