A question for the Oracle/RDBMS experts… At photo.net, we have an Oracle 9i database approximately 50 GB in size. The I/O stats show that Oracle is doing about 2 TB of reading per day from the disk drives. In other words, the entire database is read 40 times per day or 2-3 times per hour during peak periods. Our server has 16 GB of RAM, of which maybe half are devoted to caching database blocks. I’m wondering if this is too much swapping in the age of relatively cheap RAM. Should we have a server with 32 GB or even more RAM and basically try to keep the entire database in RAM at all times, maybe having physical reads be 4X the database size per day?
Comments would be appreciated.
Hmm, sounds like some parameters need to be tuned somewhere. Perhaps there are mismatches in database block sizes and filesystem block sizes (that is, if you’re not running the tablespaces on raw disk partitions). That might cause you to read large blocks off the disk when you only need a small part of each block. Perhaps, some settings are too small, causing data to be read more often. Is the database too full of pinned tables? I would think that you definitely need to reduce some memory settings if the system is swapping at the OS level.
Have you analyzed your tables? Do you regularly rebuild your indexes? I would guess you probably have, but those are things to try. Oracle can do some strange things with database blocks.
Perhaps there are some untuned queries causing too many full table scans. Perhaps some de-normalization needs to be performed.
Any chance that some search engine is winding its way through your database?
My guess is that a photo site is going to be swept from top to bottom by several web spiders every day – I know my personal gallery is. That pretty much makes it impossible for it to keep anything in cache, unless your cache is the size of the database. The only thing you can hope for is that at least the most important indexes stay in RAM. Make sure your queries are using indexes rather than sequential scans, and then make sure you have enough cache to keep the indexes in memory.
Are you actually able to determine that the I/O activity is actually reads of the database or perhaps from virtual memory activity that is occuring as it extracts the data and moves it around in memory and passes it to the other parts of the system?
Greg: I think Oracle is using 8K blocks and the file system is 2K (standard Linux) on the old system, going to 4K on the new (ext3). I don’t think there is significant OS swapping. We have a contract dba who does analyze the tables regularly and probably rebuilds the indices as necessary. I don’t believe that we have any production queries that require full table scans of large tables; something like that would bring the site down (we serve more than 2 million pages per day).
Tom: I’m 99% sure that the physical reads are real since the stats come from the database admin who gets them from Oracle itself.
Paul: If we are going to serve these spiders, it sounds as though our cache might need to be the same size as the database. The spiders don’t request JPEGs, do they? That would have a nasty effect on our photo server (the photos themselves are in a standard file system).
My company serves 5 million pages per day off two database instances with only 2GB of RAM each, with about 0.5TB total database size, but we still manage 99.8% cache hit rates and run off ancient 2000-vintage Sun hardware (think 450MHz CPUs). Of course, the locality patterns of our application may differ from yours.
On a 16GB system, you should be able to assign at least 75% of that RAM to your buffer cache. Check if you have unnecessary Oracle options like the Java components enabled, disable them if ACS doesn’t require them (although they may be a dependency of InterMedia if you use that for full-text search). Also, Oracle does not use the OS’ buffer cache, so you should tune that down from the defaults. Give the freed up RAM to the buffer cache and see if that improves your disk I/O.
I am assuming you are not making some methodological error in how you count the disk I/O stats. Oracle reports “logical reads”, which include reads from the buffer cache and actual physical disk reads, if that’s where the 2TB number comes from, it’s perfectly normal. OEM’s “performance overview” will report the stats that matter, such as cache hit rates, top waits, percentage of queries requiring soft or hard parses, and so on, you should start with that as the basis for performance tuning.
Oracle has a performance reporting system known as statspack, which if not already installed, your DBA should be able to install for you. This would allow you to schedule snapshots to be taken of all the relevent performance measures (and thankfully in recent editions of Oracle they are plentiful). There is some overhead with this but it isn’t exessive but should still be profiled on a QA system first.
Schedule these snap shots perhaps every 15 mins during a representative workload period. Then you can used the provided reporting tool to produce a delta between two snapshots. This will show SQL queries order by I/O which potentially will provide you information as to why Oracle can’t satisfy these queries from its cache and is going to out to disk. (Check the explain plan for these queries to check that they are as you would expect).
The statspack report will help identify those queries that are producing a lot of I/O and will allow you to target your tuning efforts more precisely. You can increase the granularity of the snapshots as you home in on the poor performing queries.
As an approach I would suggest that you try to identify the poor performing queries that are producing the I/O rather than try to solve this initially with more memory. As you add more memory and enlarge the SGA, it is likely the CPU utilization will increase as Oracle has to spend more time managing the LRU lists, latches etc. that it needs to manage the SGA.
If you feel the need for an on-site expert eye may I suggest Jonathan Lewis who is often in the US.
Plus: Indexes in Oracle rarely benefit from regular rebuilding to improve performance. It is usually only done to reclaim space or in a very targeted fashion. If your DBA is regularly rebuilding indexes, it can become a waste of resources and each time you run the risk of leaving an index in an invalid state.
Just to mention. In 10g, statspack became Automatic Workload Repository and I believe is installed by default with scheduled snapshots each hour. Reports are available via Enterprise Manager (web GUI) or command line SQL – e.g. @$ORACLE_HOME/rdbms/admin/awrrpt.sql
I wasn’t sure which version of the RDBMS you were running 8)
Fazal: I don’t think we have an error in the stats. Here’s a snippet of email from our dba…
Currently I am delivering 83% of the requested data from memory.
Buffer Pool: Logical_Reads Physical_Reads HIT_RATIO
DEFAULT…….. 4134051550 6885363094 83
————-
I would naively expect to serve 99% of data from the cache for day-to-day site operations, with maybe some degradation in cache performance when a robot comes through an pulls out discussion forum postings from 1997, etc.
Re: robots trampling through your site…
You can see what the Googlebot is doing using the webmaster tools:
https://www.google.com/webmasters/tools/
You can also direct the attentions of the Yahoo and Google robots using the new sitemaps standard. Create a sitemap, an XML listing of all the pages you’re interested in having spidered, once per day from a db dump.
You can persuade the search engines to add your new content quickly, and only your new content, without trawling through the unchanged sections of your site.
Would be interesting to know how hard Google is indexing photo.net at the moment…