With the proliferation of object-oriented programming in the 90s, there was an increasing demand for getting serialized data in a cohesive (and sane?) format. The ask was simple. If John Doe as an employee was a logical object, it was not unreasonable to expect his details in a single file. What we got from the database was a bunch of files with no machine-understandable connection between them. At the consumption end, one had to write extensive programs for "join" those files to form a cohesive object.

But the irony of the whole affair was that relational databases were unmanageable and irreplaceable at the same time. The OMG struggle hard for many years before screaming OMG and quitting. Object-oriented database remained a mirage, forever.

The impasse did not last long, however. The deal makers of the industry swung into action and found a compromise solution. XML was adopted to be the intermediate language.

Before we delve deeper into the design principles of XML we need to know what a multi-level data structure is.

This is our good old Windows folder:

This is how it looks after expanding 1 level:

After expanding 2 levels:

This is just another proof of one of the most enduring principles of science: Any matter can be broken down into its constituents. Scientists followed that path of investigation so rigorously and vigorously that within a span of two centuries or so they successfully reduced a delicious muffin into a few billion invisible, tasteless, odorless, formless particles.

Now let us apply that same reductionism to our dataset in the first part of this article.

The employee John Doe had these details at the first level:

emp_id,fname,lname,dob,doj,department,manager

46,John, Doe,12-Jun-1981,01-Mar-2016,Product Design,23

Please note that if we had 10 records instead of 1, the column names in the header row would appear only once. That arrangement saves space and reflects a real ledger book, too.

This is where XML first deviated from the conventional computing logic. It was proposed that each data element be tagged with an identifier (column name). Clarity was preferred over conciseness.

Tagging posed a problem. In case of multi-word and/or multi-line texts how do you mark the end after attaching a tag at the beginning? The solution was to have another tag with the same name but with slightly altered structure, which we will see shortly.

So in XML style the record mentioned above would look somewhat like this:

46

John

Doe

12-Jun-1981

01-Mar-2016

Product Design

23

Newline characters are added for decorative purpose only. <> and are XML's way of saying if a piece of text is an opening tag or a closing tag.

XML adopts OOP's top-down approach. So sitting at the top we see that in addition to the attributes mentioned above, we need two more: 1. pastEmployementHistories and 2. educations. In XML we express their presence and absence of details by empty tags.

The following is more complete:

46JohnDoe12-Jun-198101-Mar-2016Product Design23

Per XML convention you may have empty tags, which is an elegant way to express that structurally those elements are part of the whole even though their values are not available presently.

Now let us find the missing info.

Past employment history had two records:

emp_id, serial_no, employer_name, start_date, end_date, start_des, last_des

46,1,AT&T,15-Aug-2003,01-Oct-2011,Trainee,System Analyst

46,2,Amazon,02-Oct-2011,28-Feb-2016,Software Engineer,Architect

The resulting XML fragment is:

461AT&T15-Aug-200301-Oct-2011TraineeSystem Analyst

462Amazon,02-Oct-201128-Feb-2016,Software EngineerArchitect

Education, similarly, has two records:

emp_id, institution, accomplishment, graduation_yr, grade

46, Rockdate HIgh School, HIgh School Graduation, 1999, A

46, Princeton University, Computer Science Major, 2003,A

The resulting XML fragment is:

46Rockdate HIgh SchoolHIgh School Graduation1999A

46Princeton UniversityComputer Science Major2003A

After a bit of hectic assembly jobs using Ctrl+C & Ctrl+V we have the final XML document:


46JohnDoe12-Jun-198101-Mar-2016Product Design23



461AT&T15-Aug-200301-Oct-2011TraineeSystem Analyst



462Amazon02-Oct-201128-Feb-2016Software EngineerArchitect





46Rockdate HIgh SchoolHIgh School Graduation1999A



46Princeton UniversityComputer Science Major2003A




Please note that the tags marked with color red are artificially introduced to maintain structural integrity of the document.

Also note that we do not need "joins" to connect one record to the other. The nested nature of the data structure ensures that inclusion is automatic, not artificially maintained by introduction of some "non-data" like emp_id.