This post will outline the parameters and considerations that need to be made in order to create a basic data warehouse for any need.
With any data warehouse the aim is clear – The aim is to gather and process large amounts of data in the shortest space of time. If you can achieve this the the goal of a data warehouse has been reached.
Note that these parameters are not set in stone and some may need to be tweaked to fit your respective environment.
It is key to plan your data warehouse approximately from the word go. During these initial steps CPU and I/O are key failing to gauge these correctly will result in poor performance later on when you load data or look to scale out you DW. Tuning and changing parameters can only do so much until the problem become under performing hardware.
While no environment is the same a good starting point as per Oracles recommendations are 200MB/s of I/O throughput per CPU core. Also the link between should be fiber channel to accommodate.
db_block_checksum – TRUE
db_file_multiblock_read_count – 0
db_block_size – 32k
job_queue_processes – 2
log_buffer – 10485760 (10G)
log_checkpoint_timeout – 3600
Open_cursors – 1000
Processes – 500 (max number of conccurent users x2)
sessions – 1000 (Num of processes x2)
session_cached_cursors – 500
shared_pool_reserved_size – 40mb
shared_pool_size – 400mb or 0 for oracle to manage
sga_max_size – 2200mb
undo_retention – 90000
Statistics gathering is a key part to a data warehouse. Gathering statistics allows the DW to make educated decisions on the best explain plan thus reducing the query run time. Tuning a data warehouse with start schema enables is relativity simple a without diving to much into every option a simple configuration would be to create Bitmap indexes of each foreign keys columns in the fact table or tables.
dbms_stats.gather_schema_stats – GATHER AUTO
STAR_TRANSFORMATION_ENABLED – TRUE (If using Star Schema)
OPTIMIZER_MODE = ALL_ROWS
QUERY_REWRITE_ENABLED = TRUE
QUERY_REWRITE_INTEGRITY – TRUSTED
HASH_JOIN_ENABLED – true
always_anti_join = hash
always_semi_join = hash
In order to take full advantage of the hardware provided I would recommend using parallel execution of queries.
In regards to parallel execution a good starting point is below:
Objects smaller than 200 MB should not use any parallelism
Objects between 200 MB and 5GB should use a DOP of 4
Objects beyond 5GB use a DOP of 32
These sizing should be changes to fit the hardware restrictions.
parallel_automatic_tuning = true
parallel_threads_per_cpu = Number of cores per cpu x2
cpu_count = xx – make sure correct
Until next time.