2011-11-15
這裡寫的Cassandra Secondary indexes功能是以1.0版為基準所驗証出來的結果。有些部份是以實際結果來反推它實作可能的方式,不知正確性為何。如果錯了,我之後再來修正吧...:P
針對Cassandra的Secondary indexes功能,
半官方文件是這麼說的
create keyspace testks
with strategy_options=[{replication_factor:1}]
and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';
use testks;
create column family employee
with comparator = UTF8Type
and default_validation_class = UTF8Type
and column_metadata = [
{column_name : age,
validation_class : LongType},
{column_name : name,
validation_class : UTF8Type},
{column_name : dept,
validation_class : UTF8Type,
index_type : KEYS}
];
[default@testks] list employee;
Using default limit of 100
-------------------
RowKey: 3
=> (column=age, value=3, timestamp=1320997600912006)
=> (column=dept, value=rd, timestamp=1320997600913000)
=> (column=name, value=tester3, timestamp=1320997600912005)
-------------------
RowKey: 6
=> (column=age, value=6, timestamp=1320997600913008)
=> (column=dept, value=mis, timestamp=1320997600913009)
=> (column=name, value=tester6, timestamp=1320997600913007)
-------------------
RowKey: 5
=> (column=age, value=5, timestamp=1320997600913005)
=> (column=dept, value=rd, timestamp=1320997600913006)
=> (column=name, value=tester5, timestamp=1320997600913004)
-------------------
RowKey: 10
=> (column=age, value=10, timestamp=1320997600913020)
=> (column=dept, value=mis, timestamp=1320997600913021)
=> (column=name, value=tester10, timestamp=1320997600913019)
-------------------
RowKey: 8
=> (column=age, value=8, timestamp=1320997600913014)
=> (column=dept, value=mis, timestamp=1320997600913015)
=> (column=name, value=tester8, timestamp=1320997600913013)
-------------------
RowKey: 2
=> (column=age, value=2, timestamp=1320997600912003)
=> (column=dept, value=mis, timestamp=1320997600912004)
=> (column=name, value=tester2, timestamp=1320997600912002)
-------------------
RowKey: 1
=> (column=age, value=1, timestamp=1320997600912000)
=> (column=dept, value=rd, timestamp=1320997600912001)
=> (column=name, value=tester1, timestamp=1320997600897000)
-------------------
RowKey: 9
=> (column=age, value=9, timestamp=1320997600913017)
=> (column=dept, value=rd, timestamp=1320997600913018)
=> (column=name, value=tester9, timestamp=1320997600913016)
-------------------
RowKey: 4
=> (column=age, value=4, timestamp=1320997600913002)
=> (column=dept, value=mis, timestamp=1320997600913003)
=> (column=name, value=tester4, timestamp=1320997600913001)
-------------------
RowKey: 7
=> (column=age, value=7, timestamp=1320997600913011)
=> (column=dept, value=rd, timestamp=1320997600913012)
=> (column=name, value=tester7, timestamp=1320997600913010)
10 Rows Returned.
[default@testks] get employee where age = 10;
No indexed columns present in index clause with operator EQ
[default@testks] get employee where age >= 10;
No indexed columns present in index clause with operator EQ
[default@testks] get employee where age <= 10;
No indexed columns present in index clause with operator EQ
[default@testks] get employee where dept = 'rd';
-------------------
RowKey: 3
=> (column=age, value=3, timestamp=1320997600912006)
=> (column=dept, value=rd, timestamp=1320997600913000)
=> (column=name, value=tester3, timestamp=1320997600912005)
-------------------
RowKey: 5
=> (column=age, value=5, timestamp=1320997600913005)
=> (column=dept, value=rd, timestamp=1320997600913006)
=> (column=name, value=tester5, timestamp=1320997600913004)
-------------------
RowKey: 1
=> (column=age, value=1, timestamp=1320997600912000)
=> (column=dept, value=rd, timestamp=1320997600912001)
=> (column=name, value=tester1, timestamp=1320997600897000)
-------------------
RowKey: 9
=> (column=age, value=9, timestamp=1320997600913017)
=> (column=dept, value=rd, timestamp=1320997600913018)
=> (column=name, value=tester9, timestamp=1320997600913016)
-------------------
RowKey: 7
=> (column=age, value=7, timestamp=1320997600913011)
=> (column=dept, value=rd, timestamp=1320997600913012)
=> (column=name, value=tester7, timestamp=1320997600913010)
5 Rows Returned.
[default@testks] get employee where dept >= 'rd';
No indexed columns present in index clause with operator EQ
[default@testks] get employee where dept <= 'rd';
No indexed columns present in index clause with operator EQ
[default@testks] get employee where dept = 'rd' and age >=1 and age <=5 and name = 'tester5';
-------------------
RowKey: 5
=> (column=age, value=5, timestamp=1320997600913005)
=> (column=dept, value=rd, timestamp=1320997600913006)
=> (column=name, value=tester5, timestamp=1320997600913004)
1 Row Returned.
[default@testks] get employee where age >=1 and age <=5 and name = 'tester5' and dept = 'rd';
-------------------
RowKey: 5
=> (column=age, value=5, timestamp=1320997600913005)
=> (column=dept, value=rd, timestamp=1320997600913006)
=> (column=name, value=tester5, timestamp=1320997600913004)
1 Row Returned.
[default@testks] get employee where dept = 'rd' and age >=1 and age <=3;
-------------------
RowKey: 3
=> (column=age, value=3, timestamp=1320997600912006)
=> (column=dept, value=rd, timestamp=1320997600913000)
=> (column=name, value=tester3, timestamp=1320997600912005)
-------------------
RowKey: 1
=> (column=age, value=1, timestamp=1320997600912000)
=> (column=dept, value=rd, timestamp=1320997600912001)
=> (column=name, value=tester1, timestamp=1320997600897000)
2 Rows Returned.
查詢條件式如果出現多個欄位的查詢條件,只要其中一個有建立index,而且該欄位用"="查詢,其它欄位不建index也可以愛怎麼查就怎麼查
。這部份的特性跟GoogleAppEnginge上的Datastore有很大的不同(Datastore裡,多個欄位的查詢條件要能成功,必需手動建立複合Key的索引,很麻煩)。
employee.employee_age_idx-h-1-Data.db 741bytes我不知道一個查詢條件中同時有好儿個欄位有建index是否對查詢(e.g "get employee where dept = 'rd' and age = 1",dept與age都建立了index)有明顯的加速效果,但是可以很確定系統維護愈多的index,負擔會愈重。所以index適量就好,不是每個重要的欄位都一定要加index,有必要時再加就好。
employee.employee_dept_idx-h-1-Data.db 266bytes