Wednesday, March 13, 2013

How to Avoid Application Failures in the Cloud: Part 5

In this final post of the series of five, we'll look at a real life example of how Axway has used the features that I've described over the first four posts to create a secure, scalable and resilient service offering.

Apologies for taking so long to post this final part of the 'Cloud Failure' series, but I was waiting for my employer (Axway) to prepare and publish a whitepaper version of these posts on the Axway website. You can download the whitepaper from here - it does require (very brief) registration.

Putting It All Together — A Real-Life Example


We’ll finish this series of five blog posts with a look at a real-life example of how the features we have discussed are used to provide a highly secure and reliable service built on the Amazon EC2 cloud infrastructure — the Axway Cloud.

The Axway Cloud provides a set of business integration capabilities including B2B and managed file transfer (MFT) interactions. To illustrate how the previously described Amazon EC2 features can be combined to deliver an enterprise-scale service, we’ll look at the Axway Cloud MFT Service.

The Axway Cloud MFT Service provides a complete platform for secure, auditable and managed transfer of critical business files between two parties, whether they are in the same organization or in different organizations. This comprehensive cloud-based solution dramatically simplifies deployment and management by providing a fully configured system running on a highly scalable and flexible cloud infrastructure. The deployment architecture of the Axway Cloud MFT Service is illustrated in Figure 4.

Figure 4 - Axway Cloud MFT Service Deployment Architecture

Security groups: The Axway Cloud MFT Service is a classic multi-tiered application. Each tier is isolated and protected using EC2 security groups.
  • The outward-facing Edge security group opens the appropriate ports for the protocols selected by the user (e.g. port 20 for FTP, port 22 for SFTP, port 80 for HTTP, port 443 for HTTPS, and so on). These ports are open to any source, so there is no IP source filtering.
  • The ST security group restricts inbound traffic to port 4455 and only from Edge Service instances in the Edge security group. All other inbound traffic to the ST Service instances is blocked by the security group filters.
  • The DB security group opens port 1521 to the ST Service instances in the ST security group. All other inbound traffic is blocked.

Elastic Load Balancing (ELB) and auto scaling: ELB instances are used to share the application load over the Edge Service instances and also over the ST Service instances. In each case, the minimum number of instances running is two, and auto scaling is used to automatically increase the number of instances as the load on the application increases.

Elastic Block Store (EBS): The ST Service instances store configuration information, log files and file transfer restart data in a local Elastic Block Store (EBS) volume. The EBS volume is shared with all service instances. Periodic snapshots of the EBS volume (which are effectively incremental backups of the volume) are replicated to a standby copy of the MFT Service residing in a separate Amazon Availability Zone. This standby copy of the MFT Service acts as a disaster recovery instance in case of a complete failure of the primary, or production, copy of the service.

Similarly, the Oracle DB Server uses another EBS volume for a database instance. A snapshot of this volume is copied to the disaster recovery instance of the MFT Service to replicate the database information in case of a failure in the primary Availability Zone.

Disaster recovery: Axway’s Cloud Operations team has also created a set of scripts that utilize Amazon’s Web Services APIs to verify that the MFT Service instances in the disaster recovery Availability Zone are available and ready to start up if the primary Availability Zone fails. These scripts run on a regular and frequent basis, giving Axway the additional confidence that, in the case of a complete Availability Zone failure, the disaster recovery mechanism will switch over to the remote MFT Service instances and that these instances will provide continuous service to Axway’s customers.

Summary


The Axway Cloud MFT Service utilizes Amazon EC2 features to provide a cloud-based service that is not only highly secure, but is also designed to be resilient in the event individual component failures or a complete failure of a data center, such as those suffered by Amazon data centers in Virginia and Ireland. The result is an enterprise-scale service that enables organizations to exchange files containing confidential and business critical information securely and reliably in the cloud.

Conclusion


Despite some people believing that applications running in the cloud never fail and are always available, component failure or a wider problem at a data center can cause outages just like they can for on-premise applications. To help ensure availability and performance for cloud applications like the Axway Cloud MFT Service, cloud infrastructure providers including Amazon have delivered a set of infrastructure features that allow you to design and build secure, scalable and resilient applications that will meet the needs of your organization.

Wednesday, February 13, 2013

Ten Things to Think About When Moving to the Cloud

Moving applications to the cloud can bring great benefits to your organization and increase the flexibility and responsiveness of your IT department. However, when making this move, there are lots of issues to think about. For example, how secure is my data in the cloud? How reliable is the cloud-based application? What is the actual cost? How can I migrate my existing data to the cloud? And so on.

In this post, we’ll discuss some of these issues and look at the questions that you should ask before you make the move to the cloud. This list is not intended to be exhaustive and, depending upon your particular circumstances, some of these things might not apply – or there might be other things that you need to add to the list.

1.       Objectives/Goals
What are the goals and objectives of moving to cloud? Is it to reduce costs, or reduce time-to-deployment, or free up IT resources for other tasks? If you don’t know what your goals are, then how do you know whether moving to the cloud has been successful or not? If you don’t know your goals, then why are you moving to the cloud at all – just because it seems like a cool idea?
List your goals and objectives, prioritize them, and quantify them. This will give you a measuring stick to determine if the Cloud is successful for you.

2.       Security
Security is important – whether it is securing data stored in the cloud (“data at rest”) or data moving between cloud apps and on-premise apps (“data in motion”). Security breaches can be very embarrassing and can lead to financial loss (and regulatory penalties if the breach affects data covered by privacy laws).
Wherever the data is stored or in motion, it is your data and therefore your responsibility to ensure that it is secure and protected. Just because your app is in the cloud and supplied to you by a cloud provider does not absolve you of responsibility for the security of your data. You need to make sure that your cloud provider is securing and protecting the data on your behalf, whether the data is at rest or in motion.
So you should ask your cloud application provider how they secure your data in the cloud. Do they encrypt the data? What about when the data is in motion? How is it protected then? How do they protect the application against security breaches? Do they perform threat analysis and penetration testing of the application? Are they willing to discuss the results with you? The answers to these types of questions will give you an understanding of whether your application provider is securing and protecting your valuable data.

3.       Reliability
Make sure that the cloud application reliability (as defined in your SLA) matches your needs. If your cloud application is, say, an HR application, then having 12x5 availability is probably going to be OK. But what if the application is a part of a mission critical business process that needs to be available 24x7? Or what if the application must have guaranteed availability and reliability at certain times of the day e.g. at the close of trading for financial applications or at closing time for retail applications (for end of day processing, in both cases)?
Check that your cloud application SLA meets your particular requirements with regards to reliability and availability – don’t just assume that, because the application is in the cloud, it’s always available. Ask your cloud provider how they provide the reliability and availability – do they have load balancing and a high availability architecture in the cloud? Or are there points of failure that could halt the whole application?

4.       Disaster recovery
Disaster Recovery (DR) is somewhat related to reliability and availability, but in a slightly different context. DR covers the situation where there is a total failure of the data center where your cloud application is running. This has happened recently when Amazon lost power to one of their data centers in Virginia due to an electrical storm in the U.S. Mid-Atlantic region. This power outage affected services such as Netflix, Instagram, and Pinterest – albeit only for a few hours. Last year, a transformer failure knocked out the power to Amazon and Microsoft data centers in Dublin, Ireland. It took up to 2 days to restore some of the services running in these data centers.
How would you be able to withstand a loss of your cloud application for an extended period of time? How would this affect your ability to do business? If your cloud application is important, then look for a cloud provider that offers a DR capability. You will almost certainly have to pay for this functionality, but it will be worth it if it prevents extended application outages.
Ask your cloud provider how they support DR e.g. is the DR application located in a physically different data center to protect against problems such has power outages or natural disasters? What is the replication strategy between the ‘live’ data center and the backup (DR) data center? How frequently is the data replicated? How long of an outage at the live data center is required before triggering the DR application to start up? How long does it take for the DR application to start up? What is the recovery strategy for interrupted processes and transactions?
The answers to these questions will help you to determine whether the cloud provider’s DR capability is suitable for your needs.

5.       Cloud operations – who is monitoring your application?
When you use a cloud application, you are effectively outsourcing some of the administration and monitoring of the application to a third party i.e. your cloud provider. You need to understand what application and system health metrics the cloud provider is monitoring on your behalf. You need to make sure that they are monitoring enough metrics to catch problems as soon as possible – preferably before they affect the performance or availability of the cloud application. Items to be monitored by the cloud provider should include:
  • CPU Thresholds
  • Load balancing functionality
  • Disk utilization and volume capacity
  • Database metrics, e.g. utilization, I/O, etc.
  • Application specific metrics, e.g. listening ports, process pid’s, etc.
  • URL availability for user interface access
  • Security attacks:
    •  Denial of service attacks
    •  Brute force attacks
    •  Illegal log-in attempts
    •  Unauthorized file access, e.g., non-root user attempting access to host files
    •  Virus scanning
These metrics – and alerts based on metric thresholds – should provide suitable information about whether the application is running or not and whether anything abnormal has occurred (e.g. a runaway application consuming all CPU cycles). However, system or application level metrics and alerts don’t tell you whether the application is performing as promised. If application performance – whether response time, throughput, etc. – is important, then you need to ask your cloud provider about mechanisms to monitor performance. For example, this could be sending a test transaction through the application on a periodic basis to measure the processing time or response time for the transaction. If they don’t have any such mechanism or process, then how do you know whether your application is meeting your performance criteria?

6.       Self-serve dashboards
Although moving an application to the cloud means that someone else will be monitoring the health of your application, it doesn’t completely absolve you of all of that responsibility. After all, if something goes wrong with the cloud application, the users within your organization won’t be able to contact the cloud provider to find out what’s happening – they’ll need someone within the organization who can tell them what’s going on.
This means that your cloud application will need some type of self-service dashboard which can provide a view on to the application and its health. Using the dashboard, you will be able to notify your users if there are any problems, such as slow performance, the application being unavailable, etc. This, in turn, will result in a better user experience as they will be informed about problems rather than just staring at a blank screen.

7.       Transparent pricing
Before you sign up for any cloud-based application make sure that you fully understand the pricing model. This sounds obvious but often people are caught unaware by ‘hidden charges’ that were not fully explained before they signed up.
For example, if your cloud application costs are based on a ‘per user’ subscription is this concurrent users or named users? What is the cost of adding more users? How easy is it to add more users?
If your application costs are based on throughput or transactions, what is the cost of going over your quota or limits? Are there ‘overage’ charges (similar to cell phone plans)? Or are you automatically bumped up to a higher quota plan with a higher subscription cost? Is there a grace period if this is a one-off occurrence? Or are the additional fees (overage or plan upgrade) automatic?
Also consider the commitment period for your subscription. If this is month-by-month with no lock in, then this is not going to be a problem. However, if you pay for a year’s subscription in advance, what happens if you decide to cancel before the year is up? Do you get a refund for the unused portion of your annual subscription?
The answers to these questions might not affect your decision to use a particular cloud application, but you should at least be aware of the costs before you sign up, so that you’re not hit with unexpected fees when it’s too late to change.

8.       Migration
Unless you are starting from scratch with a brand new application, the chances are that you’ll have some existing data that you want to convert and migrate to your new cloud application. You’ll need to understand what bulk data import capabilities are provided with your application. Most applications offer some form of data import capability and they are usually fairly easy and straightforward to use.
However, you shouldn’t underestimate the amount of effort involved in migrating your data to the cloud. The data import mechanisms usually require that your data is in a certain format before it can be imported, so you will need extract and convert your existing data to match the import format. You should also take this opportunity to cleanse and refresh the data before you import it into the cloud application. If you’re importing CRM data, this is a good chance to go through the data and refresh any old and old-of-date customer information or delete any obsolete product or pricing data. If you need to enhance or extend the data, e.g. by merging in data from another source, now is a good time to do this.
Migration should also include the training users on the new application and planning their switch over to the cloud. Is this going to be a ‘big bang’ approach where everyone will move to the new application at the same time and any old applications will be switched off? Or will you have a phased approach to moving user to the new cloud application? Both approaches have pros and cons and you need to decide upon your approach to this.

9.       Integration
While using applications in the cloud can bring great benefits to your organization, you won’t truly realize those benefits until you integrate your cloud application with your existing applications and business processes. If you don’t do this then you get application silos that result in fragmented business processes with too many manual steps or data duplication and data rekeying. If you consider a basic application, such as an HR application, even this needs to be integrated to other applications, such as your talent management application, your IT systems for provisioning of email accounts and system logins, etc. Additionally, your HR application is just one part of your new employee on-boarding business process, so your HR application needs to be integrated into the process flows for on-boarding your new employees.
When planning to integrate your new cloud applications with your existing applications and business processes, you need to consider how you access the data and functionality of the new application. Is this via an API? If so, what standards and formats does the API use? The most common cloud application APIs are Web Services based, but do they support SOAP/XML or REST protocols? If the cloud application has a ‘batch’ import/export interface, what sort of data format does it support (e.g. XML, CSV, or JSON)? Are these protocols and formats compatible with your existing applications? Do you have in-house knowledge of the protocols and formats?
Answering these types of questions will help you to prepare the integration of your new cloud application to your existing applications and processes.

10.   Managed Services
Last but not least, we have managed services. Cloud providers are increasingly complementing their cloud-based applications with managed services offerings to help you to get the best out of the application. These managed services can range from offerings to get you started quickly, such as migration services or ‘quick start’ services, all the way through to managing the full solution on your behalf. These managed services can complement the skills and expertise that you have in-house and make it easier to get value from your new application.
Managed services are also a good way to free the IT staff from the routine tasks of managing and administering the cloud application. Hopefully, with the time freed up they can focus on more valuable work for the organization. Ask your cloud provider about any managed services that they offer. If these are of interest, find out about the scope of the services – exactly what does the cloud provider do? And what is still left for you to do? For example, if you are using a full managed service to run and administer the cloud application on your behalf, are there any limits on the work provided? Is it based on hours of effort? Or number of tasks? Or will they do anything requested? How do you request work to be done such as adding a new user to the application)? How long does it take before they perform the requested work? How can you get urgent work performed as a priority task e.g. if your CEO wants a report immediately?
This type of information should be supplied, by your cloud application provider, as a definition of the scope of a particular managed service offering. Make sure that you understand what you are getting before you sign up for a managed service for your new application.

So there it is – ten things to think about when moving to the cloud. As I said earlier, this list is not exhaustive. Your particular circumstances might add new things to the list and make some of the above things irrelevant. However, if you use this list as a starting point, you will be able to ask the right questions to your cloud providers and your move to the cloud will be from a perspective of knowledge and understanding the challenges, and not from one of blind optimism and hope.

Wednesday, January 23, 2013

Enterprise vs. Consumer Cloud – Does it Matter?

Typically when people think about the cloud and what it has to offer, they picture a vast, amorphous thing in which everything is basically the same, but that’s not the reality. Just as all cars aren’t created equal – a BMW luxury sedan is not the same as a Ford Fiesta, for example – applications in the cloud are different in some important ways.

Cloud applications can be broken into two broad categories: 1) Consumer, or prosumer (for the “professional consumer”) and 2) Enterprise. For the sake of this discussion we’ll focus on file sharing and collaboration applications in these categories.

So what’s the difference, and does it really matter?

Consumer applications are typically low-cost, mass-market solutions, usually available in both free and inexpensive premium versions. The main goal of consumer solutions is to simplify tasks such as file and photo sharing on a limited, low-volume basis.

Enterprise solutions are more robust, with strong emphasis on privacy, security and reliability, and can be customized for specific environments. They are designed for heavy-usage patterns, such as large files and high volumes.

If you’re sharing or collaborating on non-sensitive, non-confidential files with a small number of people — whether it’s an internal team or a small number of external trading partners — and you’re looking for a free or low-cost solution, then a consumer application may be the right choice.

It’s safe to say that within the right circumstances, applications like Google Docs, Dropbox and YouSendIt can be used to great effect by small businesses.

However, if you need reliability, or if the information you’re sharing is confidential, these consumer solutions may be a poor choice. Consumer applications support some level of data encryption, but they do not provide the right level of security for most enterprises. Nor do they provide chain-of-custody information — that is, a receipt proving that the recipient you sent the data to actually received it, when they received it, and that only they received it. This type of audit trail is required by regulations worldwide.

If your business handles confidential data such as patient health information (PHI), which is subject to strict privacy regulations, you require the capability to control and record who has access to the information at all times. Consumer applications typically do not provide this level of sophisticated security and control, mainly because it is not required by the mass market of casual users they target. Similarly, reliability and high availability are not requirements for most consumer-grade applications in the cloud. For example, if you cannot access and upload photos of your dog at a certain time, it isn’t a big issue.

On the other hand, for enterprises and large organizations that have to guarantee file transfers and information exchange, it is a big issue. Reliability and availability are not guaranteed with consumer applications – in fact, if you look at the Dropbox or YouSendIt terms of service, you’ll see that whether you pay for the service or not, availability is simply not guaranteed.

Companies like Axway, which offers an MFT Service in the cloud as an enterprise solution, promise 99.9% availability in their SLAs, guaranteeing your information is available, accessible, and where it’s supposed to be.

Regarding throughput and performance, again, consumer applications are designed for less-critical, less-robust use. Business users may send two or three files per week successfully, or share a document that’s accessed four or five times per day using these services, but large files are just not handled sufficiently.

Enterprise customers can share multi-gigabyte files, such as CAD/CAM engineering schematics, via services like Axway’s MFT Service in the cloud. These enterprise applications are designed to handle thousands of file transfers per hour, and very large files as well. Axway also provides best-in-class technology such as large enterprise clustering (LEC) and file-transfer accelerators which are built into the cloud technology to support enterprise-sized needs.

Not all applications in the cloud are created equal. In scenarios where security, non-repudiation, availability and performance matter, it’s time to look beyond the consumer applications to enterprise-grade applications. Otherwise, you may end up with a service that looks good on the surface, but in the end does not meet your needs and even causes damage – such as data leaks or unmet SLAs with your customers. Go straight to the applications and services that are going to provide the capabilities and performance you need, and don’t settle for less.

(This post was first published at http:blogs.axway.com)