When it comes to data sharing, agencies are facing many challenges, especially when it comes to validation. But federated AI is creating solutions for validating information. Matt Langan, Government Technology Insider’s podcast host, discussed the challenges agencies are facing with data sharing and how AI solutions can help with Kal Voruganti, Senior Fellow and Vice President at Equinix, and Scott Andersen, Distinguished Solution Architect at Verizon.
No time to read this article? You can listen to the podcast of this conversation here.
Matt Langan (ML): What are some of the challenges that federal government agencies are facing with respect to data sharing?
Scott Andersen (SA): The sad reality of data you know is “is it reusable?” One of the things that I’ve been doing for many years throughout my IT careers is trying to convince people to share information. I run into people that believe job security is a locked PDF file. You ask me a question, I write down the answer to that question, and instead of saying that here’s the answer, I send you a locked PDF file. When I send you a locked PDF file, I’m saying that my information is critical, but also that my information is my career and my value to the organization. That’s one of the huge challenges that organizations face.
The biggest problem is that it’s hard to share information. For example, when we think about information in particular about the COVID-19 vaccine, there’s a lot of data about that vaccine and they’re still collecting data at a massive rate. Let’s say five years from now, there’s a new problem in the world and you think the answer to that problem lies somewhere in the research done for the COVID-19 vaccination. If you don’t work for Pfizer, Moderna, or any of those big companies, you’re not going to get that information, however, you’re going to get some of the information that’s presented to the CDC but not all of it. The reason for that remains with the concept of my knowledge, my value, and my contribution to the company is why I get a paycheck, because I have that information and no one else.
Kal Voruganti (KV): When you get data, make sure you have authoritative information. It’s almost like a Carfax for data or AI models. When you want to buy a secondhand car, and the Carfax report has all of the information about past accidents, what work has been done on the car, etc. Agencies need something similar in the case of data, or any AI models that are being acquired from external sources. It’s important to know who built this model, what data was used, etc.
The second big issue is the lack of good templates for governance for sharing of data between consultants. In other words, organizations want to share data, but how do you figure out how you compensate somebody who’s bringing in a certain type of data? How do you penalize a bad actor whose data quality has suddenly gone down? The whole notion of governing and deciding what to do in respect to good and bad actors with respect to sharing data is another major issue that people are facing right now.
The last challenge is technology. There is a need for solutions that help people to keep their raw data with themselves, and only exchange or trade insights. You would send the algorithm to the location, and then let the algorithm work on that data within the agency’s four walls. Then, you share or take out insights.
ML: How is the ‘AI Anywhere’ solution helping with regards to data sharing and overcoming these challenges?
KV: I think people need to become more familiar with federated AI. Now all major companies, clouds, and AI platforms are advocating for the use of federated AI. Here’s an example of federated AI: there are three federal agencies, and they want to share their data to solve a common problem. You send the algorithm to each of those three federal agencies and build local models at each of those locations. Then, you bring them back to a central location and create a better global model. You ship that model to all these agencies, so that they can use it.
The key point is that the raw data never leaves the four walls or the security perimeter of these different agencies. I think this concept is picking up steam and more and more usage of the federated AI will become prevalent. The second key concept is through creating what it’s called is a demilitarized zone, where the data providers bring their data into this demilitarized or secure sandbox, and the algorithm providers also bring their algorithms into this secure neutral zone. Then the consumer can come in and run this algorithm on the data, and, with permission, take the results out. The paradigm is something AI Anywhere solution provides.
SA: I think federated AI is the first step to a known good validation source. I always tell people that the Internet is an ocean of information and you’re standing above it on top of a 50-foot platform and your mission is to dive off headfirst. The challenge is that you don’t know where you’ll land – you could land in 25,000 feet of water in the Mariana Trench, but you could also land in just a few inches of water. Since you’re jumping in headfirst, you have to make a tough decision. Most people say no, because it’s too much risk. However, with federated AI opportunity presents the concept of being able to validate information on the fly and being able to know that the information you receive is correct.
ML: What are some of the challenges organizations are facing with respect to receiving insights from their AI algorithms and their data in general?
SA: The reality today is that nobody builds applications configuring and factoring in latency. The latency that we sometimes have between organizations and within the Internet as an entity is troubling in terms of getting and using information. If you think about most government agencies today, they have multiple locations. Some locations are high performing, and some locations don’t have high performing network solutions. The inability to get data out of some locations prohibits the sharing of information.
We’ve talked about how important it is for organizations to be able to share information. The same is true with AI. I always tell people is you can build the greatest AI that’s ever been, you can create that infrastructure, but, unfortunately, if nobody can get to it, it’s useless. Having a structured information system that allows data to flow quickly from one location to another to one group of people to another personal facilitates the concept of sharing the data and allows AI to crunch that information to build a better view and get it out quickly to more people.
KV: There are not enough data scientists and also there is a whole impedance mismatch of communication between the data scientists and subject matter experts. Today, in any type of AI workflow for an organization, 70 percent of the time is not actually running the algorithm or figuring out what AI algorithm to use, but it’s actually cleaning that data, removing the noise, anonymizing it properly, and making sure it is compliant with all the regulations and security protocols.
The whole data management challenge is hard. If that data is spread across multiple clouds and across your private data centers, where you actually merge all of that data and have the location of where you have your AI stack should be in a location with high-speed connectivity and proximity to the clouds and the edge. Otherwise, the latency will kill you and moving massive datasets between these different locations is going to be very prohibitively expensive. The location of your AI stack is important, and that the impedance mismatch between data scientists and subject matter experts are big challenges.
ML: How is the AI Anywhere solution helping with this regard overall?
KV: The AI Anywhere solution is having a paradigm that is called low code or no code. Essentially, the data science part has been automated and for subject matter experts it’s much easier to quickly look at their data and gain insights. They don’t have to spend weeks talking to a data scientist and figuring out how to do the AI algorithm massaging. Latency is also very important. We are making the location of the AI stack currently globally available across all the metro locations Equinix has, so that you don’t need to move this data from the edge to the core and pay a lot of money for backhauling that traffic. From a latency standpoint, these data centers are very strategically positioned and having your AI stack hosted there makes a lot of sense.
SA: If we think about distributed AI and we start pushing AI out to smaller and smaller bits, we break the AI up into smaller capable functional areas and algorithms so that the AI is capable of building and delivering the right answer. For example, if you think about how networks operate today, the world of networking has evolved radically. When I started doing networking, we only had Ethernet, there was no cellular data, for all intents and purposes it was a wire, a network card, and a device. The reality of the today’s world is that is not the case anymore; you can take your cell phone and go anywhere at any time. The world has distributed the concept of telephones to be cell phones in our pockets, and we can talk to people anywhere. The idea of connection has changed.
When we start to think about the value of connection and the value of low latency, AI Anywhere with small bits of AI floating around becomes more and more valuable. Some organizations and some places will always have less connectivity overall than other places. The reality is that there are going to be places on earth that are not going to have great reception. We talked today about digital validation, making sure we validate information, using AI as a validation tool, federating AI so that all of the AI instances are able to interact with each other, but also to know that the information is valid, appropriate, correct, and timely.
As we begin this process of pushing AI out further into the into the wild, we reduce the latency even more. As we reduce the latency, the information gets back to the people quicker. Knowledge management is the end game for AI, which means being able to produce the right information so that when a human being gets it, it’s there at the right time, it’s the right answer, it solves the problem that they’re experiencing, and they’re able to continue with whatever they were doing and no longer have to be concerned with the problem that they were facing.
To learn more about data sharing validation and federated AI, click here.
Click here to learn more about Verizon’s Professional and Managed Services.