Seeing Exadata in action 9



Last week I was able to attend a 3 day exadata workshop conducted by performance engineers from Singapore. It was supposed to be a 5 day workshop with hands on labs but everything was squeezed (unfortunately no hands on) and just the critical sections were tackled.. The workshop highlight was the demos and these are as follows:

- loading large amount of data from an external table
- cell offloading on a typical datawarehouse query
- creation of a 400gb tablespace
- linear scalability of storage cells

Although all of these demos were done on Exadata Version 1.. and just 13 of the storage servers were running..I can say that the performance is still stunning!!!

But the real highligth for me is having my scripts run on the database machine ;)

BTW, to get the scripts.. check out the Scripts Section for the file name and the most recent version of the script..

IMPORTANT NOTE: Diagnostic Pack License is needed for the scripts

All the time they were doing demos they were just showing some graphed output of V$SYSSTAT statistics related to Exadata cell effectiveness and  a customized output of collectl showing the storage servers and the RAC nodes stacked together on a putty session and from there we can see the IO, CPU, network statistics as they were hammering the full rack Exadata..

So I was also really curious about seeing the database performance numbers.. I’ve asked if we can see the Enterprise Manager as they were doing the demos.. Unfortunately it was not available..

With the spooled output of my scripts (executed on the node where the demo was done).. I was able to see the following across SNAP_IDs:

- Load (awr_genwl.sql)
- Top Timed Events (awr_topevents.sql)
- Tablespace and File IO statistics (awr_io_ts.sql, awr_io_file.sql)
- Top SQLs (awr_topsql.sql, awr_topsqlx.sql)

Below are the output:

Load:

I was interested on the period when the demo was run (May 20th afternoon)… the time that I saw the storage servers being hammered and giving 1 GB/s throughput on each. I’d like to see CPU and IO numbers on the database side plus the AAS (Average Active Sessions) and other metrics.. more like characterizing the workload.. you can see on SNAP_ID 1211 that there was a sudden surge on DB Time, IO read MB/s, and CPU utilization..

I was surprised to see that the utilization numbers (rightmost) are very low even for the whole AWR retention period (I did not include here), and take note that this is the output from one node only so I can’t really conclude what’s happening to the entire cluster..


                                                                                                              AWR CPU and IO Workload Report

                         i                        ***                                                       ***                 ***
                         n                      Total                                                     Total               Total                                                                                                           U    S
       Snap              s       Snap   C         CPU                                             A      Oracle                  OS   Physical                                                                            Oracle RMAN   OS    S    Y    I
  Snap Start             t        Dur   P        Time         DB        DB        Bg     RMAN     A         CPU      OS         CPU     Memory      IOPs      IOPs      IOPs      IO r      IO w      Redo           Exec    CPU  CPU  CPU    R    S    O
    ID Time              #        (m)   U         (s)       Time       CPU       CPU      CPU     S         (s)    Load         (s)       (mb)         r         w      redo    (mb)/s    (mb)/s    (mb)/s Sess        /s      %    %    %    %    %    %
------ --------------- --- ---------- --- ----------- ---------- --------- --------- -------- ----- ----------- ------- ----------- ---------- --------- --------- --------- --------- --------- --------- ---- --------- ------ ---- ---- ---- ---- ----

  1208 10/05/20 08:00    2      60.25   8    28920.00      24.08      8.43     10.02     0.00   0.0       18.44    0.29      334.80   32189.62     0.000     0.082     0.050     0.000     0.001     0.000   40     1.549      0    0    1    1    0    0
  1209 10/05/20 09:00    2      59.89   8    28747.20      21.80      8.43     10.29     0.00   0.0       18.72    0.40      320.49   32189.62     0.003     0.078     0.034     0.000     0.002     0.000   40     1.490      0    0    1    1    0    0
  1210 10/05/20 10:00    2      59.87   8    28737.60      24.86      9.43      9.78     0.00   0.0       19.21    0.03      318.18   32189.62     0.006     0.080     0.039     0.000     0.002     0.000   40     1.539      0    0    1    1    0    0
  1211 10/05/20 11:00    2      60.00   8    28800.00    3211.59   1732.40     19.19     0.00   0.9     1751.59    0.39     2157.32   32189.62    56.584     4.423     0.453    55.847     0.477     0.008   41     3.404      6    0    7    7    0    0
  1212 10/05/20 12:00    2      60.02   8    28809.60     353.16     42.58     12.79     0.00   0.1       55.37    0.17      420.83   32189.62    10.901     0.210     0.145    10.734     0.004     0.001   49     2.263      0    0    1    1    0    0
  1213 10/05/20 13:00    2      60.25   8    28920.00    1415.78    465.06     18.32     0.00   0.4      483.38    0.37      839.79   32189.62     0.079     8.471     0.478     0.001     1.321     0.004   42     3.358      2    0    3    2    0    0
  1214 10/05/20 14:00    2      59.80   8    28704.00      20.56      8.60     10.49     0.00   0.0       19.09    0.03      318.41   32189.62     0.012     0.127     0.052     0.000     0.002     0.001   42     1.780      0    0    1    1    0    0

Top Timed Events:

Drilling down on the Top Events on the SNAP_ID 1211, I’m wondering if I’ll be seeing something different… you can see that half of the DB Time is attributed to “CPU time” but with just an AAS of .5 which means there’s still a lot of CPU available on the RAC node and the database is probably not blocked. You could also see that there’s some external table read going on (hmm that could be the 1st demo), and notice the event “cell smart table scan”..

                                                                                                                   AWR Top Events Report

                         i
                         n
       Snap              s       Snap                                                                                                    A
  Snap Start             t        Dur                                          Event                          Time    Avgwt DB Time      A
    ID Time              #        (m) Event                                     Rank          Waits            (s)     (ms)       %      S Wait Class
------ --------------- --- ---------- ---------------------------------------- ----- -------------- -------------- -------- ------- ------ ---------------
  1208 10/05/20 08:00    2      60.25 reliable message                             1        1674.00           8.80     5.26      37    0.0 Other
  1208 10/05/20 08:00    2      60.25 CPU time                                     2           0.00           8.43     0.00      35    0.0 CPU
  1208 10/05/20 08:00    2      60.25 PX Deq: Slave Session Stats                  3        3150.00           8.23     2.61      34    0.0 Other
  1208 10/05/20 08:00    2      60.25 control file sequential read                 4       11400.00           7.12     0.62      30    0.0 System I/O
  1208 10/05/20 08:00    2      60.25 enq: PS - contention                         5        8342.00           2.66     0.32      11    0.0 Other
  1209 10/05/20 09:00    2      59.89 CPU time                                     1           0.00           8.43     0.00      39    0.0 CPU
  1209 10/05/20 09:00    2      59.89 PX Deq: Slave Session Stats                  2        3130.00           7.27     2.32      33    0.0 Other
  1209 10/05/20 09:00    2      59.89 control file sequential read                 3       11330.00           6.68     0.59      31    0.0 System I/O
  1209 10/05/20 09:00    2      59.89 reliable message                             4        1608.00           4.14     2.58      19    0.0 Other
  1209 10/05/20 09:00    2      59.89 enq: PS - contention                         5        8347.00           2.38     0.28      11    0.0 Other
  1210 10/05/20 10:00    2      59.87 CPU time                                     1           0.00           9.43     0.00      38    0.0 CPU
  1210 10/05/20 10:00    2      59.87 control file sequential read                 2       11314.00           7.98     0.71      32    0.0 System I/O
  1210 10/05/20 10:00    2      59.87 PX Deq: Slave Session Stats                  3        3305.00           7.83     2.37      31    0.0 Other
  1210 10/05/20 10:00    2      59.87 reliable message                             4        1654.00           5.81     3.51      23    0.0 Other
  1210 10/05/20 10:00    2      59.87 enq: PS - contention                         5        8520.00           1.98     0.23       8    0.0 Other
  1211 10/05/20 11:00    2      60.00 CPU time                                     1           0.00        1732.40     0.00      54    0.5 CPU
  1211 10/05/20 11:00    2      60.00 external table read                          2       12567.00         410.76    32.69      13    0.1 User I/O
  1211 10/05/20 11:00    2      60.00 cell smart table scan                        3       86393.00         238.88     2.77       7    0.1 User I/O
  1211 10/05/20 11:00    2      60.00 gc buffer busy acquire                       4        4137.00         196.75    47.56       6    0.1 Cluster
  1211 10/05/20 11:00    2      60.00 external table misc IO                       5         119.00         119.18  1001.48       4    0.0 User I/O
  1212 10/05/20 12:00    2      60.02 cell smart table scan                        1       33606.00         197.91     5.89      56    0.1 User I/O
  1212 10/05/20 12:00    2      60.02 cell smart file creation                     2       92753.00          84.68     0.91      24    0.0 User I/O
  1212 10/05/20 12:00    2      60.02 ASM file metadata operation                  3         830.00          80.80    97.35      23    0.0 Other
  1212 10/05/20 12:00    2      60.02 control file sequential read                 4       11440.00          56.00     4.90      16    0.0 System I/O
  1212 10/05/20 12:00    2      60.02 CPU time                                     5           0.00          42.58     0.00      12    0.0 CPU
  1213 10/05/20 13:00    2      60.25 CPU time                                     1           0.00         465.06     0.00      33    0.1 CPU
  1213 10/05/20 13:00    2      60.25 gc buffer busy acquire                       2       15209.00         379.47    24.95      27    0.1 Cluster
  1213 10/05/20 13:00    2      60.25 external table read                          3        5706.00         211.29    37.03      15    0.1 User I/O
  1213 10/05/20 13:00    2      60.25 gc current block busy                        4        1091.00          94.23    86.37       7    0.0 Cluster
  1213 10/05/20 13:00    2      60.25 external table misc IO                       5          48.00          69.21  1441.82       5    0.0 User I/O
  1214 10/05/20 14:00    2      59.80 CPU time                                     1           0.00           8.60     0.00      42    0.0 CPU
  1214 10/05/20 14:00    2      59.80 PX Deq: Slave Session Stats                  2        3159.00           7.09     2.24      34    0.0 Other
  1214 10/05/20 14:00    2      59.80 control file sequential read                 3       11363.00           6.35     0.56      31    0.0 System I/O
  1214 10/05/20 14:00    2      59.80 reliable message                             4        1706.00           4.86     2.85      24    0.0 Other
  1214 10/05/20 14:00    2      59.80 enq: PS - contention                         5        8312.00           2.23     0.27      11    0.0 Other

Tablespace IO statistics:

Having seen the 1 GB/s throughput on each storage server I’d like to know how it is with the tablespace and datafile latency.. As you can see below there are high reads on tablespace DG14D7 and writes on DG14D1

                                                                                                                 AWR Tablespace IO Report

                         i
                         n
       Snap              s       Snap
  Snap Start             t        Dur                        IO                IOPS                                      IOPS                                              IOPS
    ID Time              #        (m) TS                   Rank    Reads Av Reads/s Av Rd(ms) Av Blks/Rd   Writes Av Writes/s Buffer Waits Av Buf Wt(ms) Total IO R+W Total R+W
------ --------------- --- ---------- -------------------- ---- -------- ---------- --------- ---------- -------- ----------- ------------ ------------- ------------ ---------
  1208 10/05/20 08:00    2      60.25 SYSAUX                  1        0          0       0.0        0.0      203           0            0           0.0          203         0
  1208 10/05/20 08:00    2      60.25 UNDOTBS2                2        0          0       0.0        0.0       90           0            0           0.0           90         0
  1208 10/05/20 08:00    2      60.25 SYSTEM                  3        0          0       0.0        0.0        2           0            0           0.0            2         0
  1209 10/05/20 09:00    2      59.89 SYSAUX                  1        7          0       1.4        1.0      185           0            0           0.0          192         0
  1209 10/05/20 09:00    2      59.89 UNDOTBS2                2        0          0       0.0        0.0       91           0            0           0.0           91         0
  1209 10/05/20 09:00    2      59.89 SYSTEM                  3        1          0       0.0        1.0        5           0            0           0.0            6         0
  1210 10/05/20 10:00    2      59.87 SYSAUX                  1       20          0       2.5        1.0      194           0            0           0.0          214         0
  1210 10/05/20 10:00    2      59.87 UNDOTBS2                2        0          0       0.0        0.0       88           0            0           0.0           88         0
  1210 10/05/20 10:00    2      59.87 SYSTEM                  3        0          0       0.0        0.0        4           0            0           0.0            4         0
  1211 10/05/20 11:00    2      60.00 DG14D7                  1   112984         31      21.0       63.5        0           0            0           0.0       112984        31
  1211 10/05/20 11:00    2      60.00 DG14D1                  2      382          0       5.8        1.0    15317           4         2257          37.0        15699         4
  1211 10/05/20 11:00    2      60.00 UNDOTBS2                3        0          0       0.0        0.0      263           0           44           0.0          263         0
  1211 10/05/20 11:00    2      60.00 SYSAUX                  4       12          0       1.7        1.0      239           0            0           0.0          251         0
  1211 10/05/20 11:00    2      60.00 SYSTEM                  5       44          0       2.0        1.0      101           0         1160         121.1          145         0
  1212 10/05/20 12:00    2      60.02 SYSAUX                  1       20          0      60.0        1.0      298           0            0           0.0          318         0
  1212 10/05/20 12:00    2      60.02 UNDOTBS2                2        0          0       0.0        0.0      176           0            0           0.0          176         0
  1212 10/05/20 12:00    2      60.02 SYSTEM                  3       41          0       3.9        1.0       27           0            0           0.0           68         0
  1213 10/05/20 13:00    2      60.25 DG14D1                  1      259          0       3.7        1.0    30061           8         1068          29.0        30320         8
  1213 10/05/20 13:00    2      60.25 SYSAUX                  2        9          0       2.2        1.0      265           0            0           0.0          274         0
  1213 10/05/20 13:00    2      60.25 UNDOTBS2                3        0          0       0.0        0.0      249           0            3          10.0          249         0
  1213 10/05/20 13:00    2      60.25 SYSTEM                  4       17          0     222.9        1.0       45           0         1881         192.4           62         0
  1214 10/05/20 14:00    2      59.80 SYSAUX                  1       44          0       4.1        1.0      285           0            0           0.0          329         0
  1214 10/05/20 14:00    2      59.80 UNDOTBS2                2        0          0       0.0        0.0      160           0            1           0.0          160         0
  1214 10/05/20 14:00    2      59.80 SYSTEM                  3        1          0      20.0        1.0       10           0            0           0.0           11         0

Datafile IO statistics:

Below are the datafiles with read/write activity across SNAP_IDs… You may notice that tablespaces DG14D7 and DG14D1 on disk group +DISKVOL1 contains only user data while +DISKVOL3 contains the SYSTEM, SYSAUX, UNDO tablespaces

                                                                                                                    AWR File IO Report

                         i
                         n
       Snap              s       Snap
  Snap Start             t        Dur                                                                                           IO                IOPS                                      IOPS                                              IOPS
    ID Time              #        (m) TS                   File# Filename                                                     Rank    Reads Av Reads/s Av Rd(ms) Av Blks/Rd   Writes Av Writes/s Buffer Waits Av Buf Wt(ms) Total IO R+W Total R+W
------ --------------- --- ---------- -------------------- ----- ------------------------------------------------------------ ---- -------- ---------- --------- ---------- -------- ----------- ------------ ------------- ------------ ---------
  1208 10/05/20 08:00    2      60.25 SYSAUX                   2 +DISKVOL3/rtldb/datafile/sysaux.299.714681829                   1        0          0                           203           0            0           0.0          203         0
  1208 10/05/20 08:00    2      60.25 UNDOTBS2                 4 +DISKVOL3/rtldb/datafile/undotbs2.260.714681979                 2        0          0                            90           0            0           0.0           90         0
  1208 10/05/20 08:00    2      60.25 SYSTEM                   1 +DISKVOL3/rtldb/datafile/system.300.714681827                   3        0          0                             2           0            0           0.0            2         0
  1209 10/05/20 09:00    2      59.89 SYSAUX                   2 +DISKVOL3/rtldb/datafile/sysaux.299.714681829                   1        7          0       1.4        1.0      185           0            0           0.0          192         0
  1209 10/05/20 09:00    2      59.89 UNDOTBS2                 4 +DISKVOL3/rtldb/datafile/undotbs2.260.714681979                 2        0          0                            91           0            0           0.0           91         0
  1209 10/05/20 09:00    2      59.89 SYSTEM                   1 +DISKVOL3/rtldb/datafile/system.300.714681827                   3        1          0       0.0        1.0        5           0            0           0.0            6         0
  1210 10/05/20 10:00    2      59.87 SYSAUX                   2 +DISKVOL3/rtldb/datafile/sysaux.299.714681829                   1       20          0       2.5        1.0      194           0            0           0.0          214         0
  1210 10/05/20 10:00    2      59.87 UNDOTBS2                 4 +DISKVOL3/rtldb/datafile/undotbs2.260.714681979                 2        0          0                            88           0            0           0.0           88         0
  1210 10/05/20 10:00    2      59.87 SYSTEM                   1 +DISKVOL3/rtldb/datafile/system.300.714681827                   3        0          0                             4           0            0           0.0            4         0
  1211 10/05/20 11:00    2      60.00 DG14D7                  15 +DISKVOL1/rtldb/datafile/dg14d7.501.714759895                   1   112984        187      21.0       63.5        0           0            0           0.0       112984        31
  1211 10/05/20 11:00    2      60.00 DG14D1                  13 +DISKVOL1/rtldb/datafile/dg14d1.500.714758119                   2      382          1       5.8        1.0    15317          25         2257          37.0        15699         4
  1211 10/05/20 11:00    2      60.00 UNDOTBS2                 4 +DISKVOL3/rtldb/datafile/undotbs2.260.714681979                 3        0          0                           263           0           44           0.0          263         0
  1211 10/05/20 11:00    2      60.00 SYSAUX                   2 +DISKVOL3/rtldb/datafile/sysaux.299.714681829                   4       12          0       1.7        1.0      239           0            0           0.0          251         0
  1211 10/05/20 11:00    2      60.00 SYSTEM                   1 +DISKVOL3/rtldb/datafile/system.300.714681827                   5       44          0       2.0        1.0      101           0         1160         121.1          145         0
  1212 10/05/20 12:00    2      60.02 SYSAUX                   2 +DISKVOL3/rtldb/datafile/sysaux.299.714681829                   1       20          0      60.0        1.0      298           0            0           0.0          318         0
  1212 10/05/20 12:00    2      60.02 UNDOTBS2                 4 +DISKVOL3/rtldb/datafile/undotbs2.260.714681979                 2        0          0                           176           0            0           0.0          176         0
  1212 10/05/20 12:00    2      60.02 SYSTEM                   1 +DISKVOL3/rtldb/datafile/system.300.714681827                   3       41          0       3.9        1.0       27           0            0           0.0           68         0
  1213 10/05/20 13:00    2      60.25 DG14D1                  13 +DISKVOL1/rtldb/datafile/dg14d1.500.714758119                   1      259          0       3.7        1.0    30061          50         1068          29.0        30320         8
  1213 10/05/20 13:00    2      60.25 SYSAUX                   2 +DISKVOL3/rtldb/datafile/sysaux.299.714681829                   2        9          0       2.2        1.0      265           0            0           0.0          274         0
  1213 10/05/20 13:00    2      60.25 UNDOTBS2                 4 +DISKVOL3/rtldb/datafile/undotbs2.260.714681979                 3        0          0                           249           0            3          10.0          249         0
  1213 10/05/20 13:00    2      60.25 SYSTEM                   1 +DISKVOL3/rtldb/datafile/system.300.714681827                   4       17          0     222.9        1.0       45           0         1881         192.4           62         0
  1214 10/05/20 14:00    2      59.80 SYSAUX                   2 +DISKVOL3/rtldb/datafile/sysaux.299.714681829                   1       44          0       4.1        1.0      285           0            0           0.0          329         0
  1214 10/05/20 14:00    2      59.80 UNDOTBS2                 4 +DISKVOL3/rtldb/datafile/undotbs2.260.714681979                 2        0          0                           160           0            1           0.0          160         0
  1214 10/05/20 14:00    2      59.80 SYSTEM                   1 +DISKVOL3/rtldb/datafile/system.300.714681827                   3        1          0      20.0        1.0       10           0            0           0.0           11         0

Top SQLs:

I’d like to know the top SQLs running on the server, to my surprise as I was investigating on the output… the number 1 SQL was the massive data load executed on the demo (SNAP_ID 1211)…

This direct-path load (INSERT /*+ APPEND */) was using an external table with flat files (multiple small files in similar size with total size roughly around 50GB) staged on a DBFS (database filesystem) and loading it to a compressed table… This is very similar to the Data Warehouse data loading best practice mentioned here

What’s interesting about this it was able to process the 199million rows in just 27mins! and take note of the low AAS (0.39) doing this operation… and it would be nice to see the SQL Monitor Report for this execution as well as the output of my AWR scripts on other RAC nodes.. and yes the final “select count(*)…” on the target table…

BTW, I’ve briefly explained the output of the AWR Top SQL script on this reply to Oralce-l

                                                                                                                    AWR Top SQL Report

                         i                                                                 Ela
                         n                                                                Time
       Snap              s    Snap                         Plan                 Ela        per        CPU         IO        App        Ccr    Cluster                                                                               PX       A
  Snap Start             t     Dur SQL                     Hash                Time       exec       Time       Wait       Wait       Wait       Wait                                 Direct                            Parse   Server       A Time SQL
    ID Time              #     (m) ID                     Value Module          (s)        (s)        (s)        (s)        (s)        (s)        (s)          LIO          PIO       Writes         Rows     Exec      Count     Exec       S Rank Text
------ --------------- --- ------- --------------- ------------ -------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ------------ ------------ ------------ ------------ -------- ---------- -------- ------- ---- ------
  1211 10/05/20 11:00    2   60.00 gz90qgc3f72fa     1967845001 SQL*Plus    1411.13    1411.13     565.92     305.14       0.00      64.05     347.28       536400           93        99690    199553620        1         17       16    0.39    1 INSERT
  1213 10/05/20 13:00    2   60.25 gz90qgc3f72fa     1967845001 SQL*Plus    1359.58    1359.58     443.98     291.29       0.00      16.64     518.73       466179           17       292337    198131707        1         17       16    0.38    1 INSERT
  1211 10/05/20 11:00    2   60.00 4tb00t5k8jc41     3807662380 SQL*Plus    1306.22     326.56     999.78     307.32       0.00       0.00       0.00     13140180     13107338            0            3        4         20       12    0.36    2 select
  1050 10/05/13 18:00    2   59.94 g2cd0wdckd7rn     3807662380 SQL*Plus     785.48                575.09     210.45       0.00       0.00       0.00      7226484      7208526            0            0        0          8        8    0.22    1 select
  1049 10/05/13 17:00    2   59.93 g2cd0wdckd7rn     3807662380 SQL*Plus     785.42                577.15     208.40       0.00       0.00       0.00      7191030      7173160            0            0        0          8        8    0.22    1 select
  1213 10/05/20 13:00    2   60.25 0kkhhb2w93cx0     2170058777              469.05       0.29       3.21       0.00       0.00      11.60     452.32        32076            0            0         1610     1610       1609        0    0.13    2 update
  1211 10/05/20 11:00    2   60.00 2nszgcfgcgvts      666733144 SQL*Plus     418.08     418.08     151.28     233.01       0.00       0.03       0.00         1782           17            0            1        1         17       16    0.12    3 select
  1211 10/05/20 11:00    2   60.00 0kkhhb2w93cx0     2170058777              215.14       0.17       1.37       0.00       0.00       4.70     208.65        12228            0            0         1263     1263       1263        0    0.06    4 update
  1212 10/05/20 12:00    2   60.02 4tb00t5k8jc41      597344198 SQL*Plus     125.51     125.51       1.75     123.71       0.00       0.00       0.00       519885       518076            0            1        1          5        4    0.03    1 select

Well this is really different from what Greg Rahn has done on the Exadata V2 loading 7.8 billion rows in 275 seconds! hmmm because of a whole lot of factors.. bigger bandwidth, faster CPUs, the Sun machines, new cool stuff on the V2, compression, table definition, # of columns, data type, etc. etc. etc. etc.

But this is astounding compared to the data load rate I’m seeing on other Data Warehouse environments… :)

And this is where the importance of having a balanced configuration kicks in.. On some environments I see storage saturation (worst is seeing the disks 100% busy with high service times and queue) and low storage bandwidth causing low data load rate and longer response times affecting the whole ETL process… as an example you can see from the figures below that even if you have the right amount of CPU (16 powerful CPUs) but if the IO subsystem can’t cope with the application requirements you clearly have a bottleneck.. This is a linear regression I made on IO read and write MB/s on Direct Path Read and Large Writes respectively in relation to the scheduled ETL run times and daily workload patterns.

IO read IO write

Having seen how Exadata behaves and learning about the architecture, administration, IORM (IO Resource Management), monitoring, and migration paths… although lacking hands-on experience and there are more things I still have to learn! I can say that it’s no longer a black box… it has already been demystified..  ;)  I have to say.. hands-down to the Exadata Storage Server Software without it all the storage servers will just behave as a traditional storage server and it’s the cool stuff that makes each of them able to run on its full speed given the bandwidth limits..

Also there are some stuff I am curious about in Exadata V2 or V1…

1) Having most of the work done underneath, I wonder what’s the saturation point of the Storage Servers and when will be the trigger that I have to add another storage server or Exadata rack..

2) And how is it going to be if applied with IORM (IO Resource Management) and production and test databases are sharing Exadata Storage Servers… more like large scale consolidation of databases… hmm..

3) I wonder how this will change the capacity planning space.. because you now have Storage Servers that could work smart and work hard making the difference for the leap in performance.. hmm..




About these ads

9 comments

  1. As the author of collectl I’m always interested in performance and one thing I always find fascinating is the correlation of activities across a system. When I look at some of your output, which only seems to sample at once an hour, I wonder about what’s happening to other system resources at the times you highlighted. Perhaps the real question is when you report a disk load, what is the sample time? is it the snap time of 1 hour?Experience shows you have to get pretty fine grained for the numbers to be meaningful, closer to the 5-15 second range. The other thing is it can also be very useful to see those same resources prior to and after the times you’re showing as single point in time may not show what’s really going on.

    Are the disks running at their maximums and might there be an opportunity for increasing throughput by configuration changes? Did the increase in disk rates exactly correlate with queries? What’s happening to memory utilization, particularly slabs, since they can fluctuate quite a lot under heavy load. Or what about memory fragmentation as reported by /proc/buddy? Lots of other things.

    A couple of thoughts come to mind that may or may not make sense, but if could plot your data against collectl’s, some interesting things might show up. But the question of HOW to do that can get complicated. You can certainly load collectl’s data into a spreadsheet and plot it there, but it’s kind of hard work. On the other hand if you could append you data to collectl’s you could define some additional plots using colplot.

    Not being an oracle person I’m not really sure what’s possible and what isn’t, but if you can get any of these stats in real time, you can relatively easily write a custom module that will allow collectl to read that data directly and display [or even plot] that data along with all of collectl’s other data natively. If you’re a perl programmer and are interested I could probably walk you through it. But as I said I don’t know if it’s worth the effort of not.

    -mark

    • Hi Mark,

      Yes the snap interval is 1hour and I can say that the latency numbers could be pretty bad when we drill down on 5sec intervals and it is very likely that the devices are running at their maximum..

      Sorry to reply just now.. I’ve been thingking about this for quite a while. I made a few test cases and will blog my observations and what I’ve learned from it..

      The initial post is here:

      http://karlarao.wordpress.com/2010/07/05/oracle-datafile-io-latency-part-1/

      All the best..

  2. Pingback: Oracle datafile IO latency – Part 1 « Karl Arao's Blog

  3. Pingback: Oracle Clinic

  4. Storage can be shared by multiple types of workloads and multiple databases
    Sharing lowers administration costs
    Sharing leads to more efficient usage of storage
    But, workloads may not happily coexist
    ETL jobs interfere with DSS query performance
    One production data warehouse can interfere with another

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s