This article was authored and contributed by Kong Yang, Head Geek at SolarWinds.
Transitioning portions of IT applications and services into the cloud is something that’s starting to become a part of daily life for IT professionals across government agencies. The benefits of utilizing existing infrastructure and application services while leveraging the cloud are increasingly attractive. But it’s still a big departure from the traditional implementations where you could walk down the hall and hug the server and systems that provide those applications and services.
Moving to the cloud requires a planned approach with more thought and consideration than traditional hardware deployments. Agencies must factor in security and compliance requirements as well.
Hybrid Cloud Scenarios
There are three general scenarios that will involve cloud-based and on-premises apps and services.
The first is an architected solution that uses components of on-premises data centers as well as cloud service providers. One example is application servers in the cloud while the backend database servers reside on-premises. Government data can be sensitive in nature and requires special considerations. This type of hybrid approach allows agencies to govern sensitive data in their data centers. Applications, however, need to be highly responsive to the number of potential clients connected through any given methodology, and cloud implementations can provide that type of flexibility and availability.
The second is a redundant implementation where an application is available in both an on-premises data center and in the cloud. This approach is good for disaster recovery and also enhances an agency’s ability to quickly pivot support for remote and mobile users. Data replication is a key consideration in this scenario, but replicating data among several distributed databases is geek chic these days.
The third is about scalability. An agency may have an on-premises application that performs just fine 95% of the time, but that other 5% of the time it finds itself resource constrained due to peak loads or other load bearing factors. The challenge for the agency, though, is that the 5% doesn’t justify the investment in additional on-premises hardware that will actually sit idle the other 95% of the time. Cloud is the ideal solution to this problem, as it allows agencies to rapidly ramp-up resources to accommodate periods of peak demand, and pay for what is actually used during that 5% of the time. The ability to quickly scale on-demand is a key benefit of cloud.
The key to successfully working through any one of these scenarios is in understanding the requirements of each approach and planning ahead, both logistically (IT Operations) and financially (Business Operations), to be able to deliver those requirements.
Key Strategies for Implementation
Regardless of which scenario fits your agency’s needs, you’ll want to partner with a cloud service provider that has achieved their Federal Risk and Authorization Management Program (FedRAMP) certification. This means that the provider has met federal requirements for security and compliance assessment, authorization, and continuous monitoring for their cloud products and services, which should help clear a few hurdles with your security operations team.
Scenario one is highly dependent on the data path from client to application server, as well as from application server to data server. The first key objective here is ensuring that dedicated bandwidth exists between the cloud and the on-premises data center. In addition, redundancy in bandwidth should be implemented across the providers. It’s absolutely necessary to minimize the downtime between the cloud and on-premises data center.
Active monitoring of the bandwidth consumption on the data paths should be implemented and alerting configured when the Quality-of-Service (QoS) becomes less than optimal levels. What those levels are will vary from implementation to implementation; every organization will need to determine what their own levels of tolerance for QoS actually are.
Scenario two also has a dependency on the data path between the cloud and the on-premises data center. The key here is how much delay in the data replication between the two parallel environments can be tolerated. If, for example, the cloud-based environment is primarily read-only, supporting staff in the field, and the actual data changes are done from headquarters, then the tolerance for replication delays may be fairly high. On the other hand, if the data transactions are coming from both sources, then this is pretty much the same issue that has existed with distributed databases historically.
Important to the success of this implementation will be active monitoring of the data replication activities, and a proven contingency plan for when data replication is disrupted beyond acceptable tolerances.
One other potential area for consideration here is how to handle users that work both in the field and on your enterprise network. Is it acceptable for those users to work from two different application environments? If so, this will also impact your tolerances for data replication. It may also be that these users can simply continue to use cloud services for the application, even though they may be physically present at one of your facilities.
Scenario three is the most complicated to implement, and engaging with a qualified service provider who has experience in implementing an on-demand, scale-out type of environment is a good idea. The key aspect of this implementation is the transparency of inbound connections being rolled over, or rerouted, from the primary site in the on-premises data center to the supplemental resources in the cloud.
Many organizations will determine that a preferable alternative is to move the entire application to a cloud-based solution and then the scaling can be dealt with exclusively within the environs of the cloud service provider. However, some agencies may not have this choice due to security considerations for physical location of data storage or the need to maintain local access to the application, perhaps due to usage volumes.
In Conclusion: Plan, Plan, Plan!
Regardless of which of these scenarios you might be contemplating, or possibly even a scenario not discussed here, it’s absolutely critical to have a plan, and involve both technology as well as business stakeholders in the development of that plan. Identify and analyze all contingencies, and define performance and QoS expectations for every aspect of the environment, from end-users’ perceptions, device performance and connectivity, through the public network connections, and all the way to the back-end servers and data stores.