The Oracle Australia and New Zealand Middleware and Technology Blog.

Monday, May 4, 2009

The Joys of MAA...

Oracle's Maximum Availability Architecture

Greetings All,

This is my first post for the Red Room. My name is David Centellas and I'm a field consultant specialising in Oracle Database and options with Maximum Availability Architecture and Datawarehousing on the side. I recently had the pleasure of sitting through a presentation given by Alex Gorbachev (MD of Pythian Group) at InSync '09' at the Hilton hotel in Sydney. Alex touched on some key points that many clients are asking today; mainly around Maximum Availability Architecture, Dataguard and Automatic Storage Management.

Here is a snippet in case you missed it:

One of the most interesting points that Alex pointed out was the uptake of Extended RAC into 'production' at client sites. He does go on to say it is a fairly advanced configuration; however it is working and working well in production. I thought I would take a bit of time to chat about what to think about when thinking about an Extended RAC. Here are some guidelines that need to be taken into account. As usual priority should be given to the following Oracle Best Practices in the implementation of such architectures and as Alex mentions in the video “Identifity what you really need for your business or organisation.”

Steven Chan had a really good diagram of MAA on his blog:

All we would have to do is think of Extended RAC as an extension of the “Database Tier”. All the rest of the principals remain the same.

As mentioned by Alex “The hardest part (is) as you separate the datacenters, the latency between the site(s) increase and this is where the challenges are coming from.” Network latency is one of the biggest issues when coming to extended cluster type scenarios however there are other rules of thumb that need to be addressed;

i) Extended RAC over distances of 10-20 KM's are possible without being project inhibitive (cost of a fibre link is another story ;)). It's not to say longer distance Extended RAC's are not possible, there are some customers with Extended RAC's over 50 KM's in distance, however there are specific measures and QOS in place to ensure this configuration is viable (as well as the use of Dark Fibre technologies for maximum throughput).

ii) Redundancy is key! Make sure there is never a single point of failure, this includes the remote link! Dual NIC's, multiple RAC nodes, Disk redundancy, power redundancy, UPS, etc...

iii) Amount of Data? Is the amount of changing data too much for the link to keep up with? If we get this one wrong then you will forever be chasing your tail trying to catch up with the primary RAC.

As Alex mentions “Try to be as simple as possible” in architecting such solutions.

Let's not forget:

1 Gbit
= 1 000 000 000 bits /sec
= 125 000 000 Bytes /sec
= 119.209 MegaBytes /sec

This is the theoretical limit of gigabit, however real world scenarios usually show a significant impact of up to 50% on this theoretical limit. Calculate Calculate Calculate. I remember I was at a client site couple of years back trying to diagnose a throughput problem that was occurring on the systems. We bet the pizza we had ordered (was about midnight in a government department in New Zealand) on what was the cause of the problem. I threw up that it was a gigabit bottleneck and he threw up that it was Hard Disk latency.

At the end of the day we performed our calculations and figured out it was a big of a mixture of both so we split the costs of the pizza; however this points out one of the classic schools of thought that ‘there’s no way we’ll be flooding a gigabit link’ and now I see it most of the time.

1 comment:

Alex Gorbachev said...

Wow, what a plug! :-D
Thanks Dave.

Re 1 Gbit interconnect. Agree, it's usually more than enough for OLTP applications. It's not easy to saturate interconnect with *OLTP* traffic - CPU saturation and LMS processes saturation usually kicks in earlier as bottleneck. Never say never but it's generally the case indeed.

With Extended RAC, added latency adds even more to it. We can't solve serialization issues by increasing throughput so in this case, you can't beat physics and the only solution is to avoid serialization points so back to application design. :)