Do you find data the way you stored it – or store data the way you find it?

data storage — Image credit: rawf8 / Shutterstock.com

Do you find data the way you stored it, or do you store data the way you find it? Not a silly question when determining scale and performance of information systems.

Depending on what one decides, the result will be two very different information engine functional architectures, schema design and scale and performance factors. Given that our systems are reaching a gigantic scale, the latter method (storing data the way you find it) is probably the best choice.

This diagram illustrates a composite services platform engine I had developed. (I have let the patent expire, having moved onto social inclusion cell based system ‘cuuble’.) The engine design may provide ideas as to addressing some of the BC scaling and performance issues, and is described and referenced below.

With regard to subsystems, I am researching the various issues with blockchain, and have noticed some implementations are using an underlying SQL database. I suspect quite a few table joins are required for searching. NoSQL databases are also used. Also notice there are separate blockchain search tools available, and assume these would be application-specific as to the transaction schema design.

When it comes to contracts (user registration/sessions) and access control methods, again its not quite clear how different operational environments would apply user validation and registration schemes. There are blockchain requirements for X.500/509 PKI and validation and a range of application level modules, e.g. messaging users and the like. So integrated or ‘composite’ architectures are key for scale and performance reasons in both cases.

The biggest issue with X.500 is that it was widely criticized as a heavyweight protocol, which it never was. It is an abstract object-oriented hierarchical (directory) information system that can be distributed and/or replicated, and is governed by namespace design and the security requirements over that. Therefore, it is an infrastructure information system technology which is architected and designed using (systems) identity and governance engineering decisions as required by the business.

I was once asked in my early days of directories why anyone would want a directory that holds 100 million entries. The answer was simple: because they want 100 million online customers to log on and do business!

I mention BC and X.500 directories not as one replacing the other, but as distributed and replicated information infrastructure technologies. I see common features and problems to very large scale deployments and operations, dependencies as to PKI, service delivery, integrated systems requirements as to health and care systems, IoT, distributed governance and social focus.

Some background

I worked on X.500/509 in the 1980s with many others in ISO/ITU, started a directory development (open directory), worked with PKI on scale, defense and the like, then into large scale directory enabled service delivery platforms (SDPs). From that experience I gained considerable experience as to scale and performance issues with customer centric systems and X.500 object based directories, identity management, services portfolio management policy based access controls, distributed directory objects mapped to RDB data stores.

And from that experience, I had a rethink about how to make SDP architectures and their implementation much better with respect to being integrated and the scale and performance factors improved.

In 2002 I started work on the technical design of a prototype/demonstrator service delivery platform and called it composite adaptive directory service (CADS) . I filed for a patent in 2003, which was granted, but I have let it expire. I have moved onto social cell systems now.

I was then engaged with a large team in a convergence program to rebuild the customer-facing front end of a 22-million user system, the SDP.

I note here re: the SDP scale. Approximate numbers: 22m users, 12m devices, 10 directory objects per user, 60-100 attributes per object and e.g. 50 bytes per attribute. This represents about one terabyte of data, which some are saying is the BC limit at the moment.

The diagram above identifies the key SDP object information model and its external functions (billing and CRM). Subscribers with their preferences, entitlements, device objects, message boxes formed one major set of objects and the products and services the other.

The data storage for the SDP user and services information was about a terabyte (the BC limit, some say). But we had to cater for a visible performance of up to 10,000 user authentications a second.

Composite adaptive directory service (CADS)

With the CADS work I developed a completely multi-function directory-enabled service delivery system that was memory based. And the multi-function agenda is the ‘composite’ part of ‘CADS’.

And within the CADS engine, there are dynamically scheduled data adaption functions for sorting object classes, attribute types, directory tree layouts into the most optimum layouts for memory table organization (segments) – the ‘adaptive’ part of the design. Perhaps its an example of AI, one never knows. With the adaption regime for example, there is an algorithm that I called horizontal indexing, basically the adaption function did not have to search for anything it simply got the item. And I thought having these functions placed in to Xilinx FPGA devices (see patent claim 83) would be very useful .

Working with RMIT and their computer/electronics department, a student’s thesis was to take a part of the CADS adaption indexing algorithm and placing that in the FPGA. (I can look up the paperwork if there is any interest here.)

With the CADS design and its FPGA approach, I was hoping to get to 200,000 SDP operations per second.

The CADS US patent has expired and hopefully will be read to see why I think it might help the BC agenda, but I’m open to comments, obviously.

For those interested in the wwiteware SDP tech description, its architecture and all its functionality, click here.

Blockchain and directory services and SDP comparisons

Correctly or not, I see parallels with X.500 object trees and blockchain re: top level entries and child chains / subordinate objects, schema, access control management and the performance and the integrated application level functional demands. So perhaps the CADS patent may provide some ideas.

The next step is to see how one might consider operational scenarios as to infrastructure at the national level, assess transaction rates and capacity issues and perhaps overly that with two abstractions – one being a service platform and the other perhaps a really big abstraction of an integrated health, care and social system that BC can support.

Where my head is now as to GDPR, social cells and privacy and the infrastructure issues:

Let us evolve that adaption, data indexing approach to information. Say we tag data attributes and objects as to GDPR consent levels, purpose and context, legitimate use, even the GDPR Article reference /group identifiers, and we develop in a composite adaptive way an AI function for our information engines. From our usage of information that does relate to GDPR, we automatically have the metadata at hand from the data engine to trigger onward personal data usage and procedures.

I have mapped GDPR Articles onto a spread sheet as to information definitions, predicates, schema, outcomes, etc, and am thinking of embracing GDPR into our data/information engines and support systems so we can build such privacy based infrastructures as shown in the abstraction above. All comments appreciated.

Written by Alan Lloyd, co-founder cuuble and executive director of wwite.