What's a Schema?

Untitled
Creative Commons License photo credit: James Clarke

A schema is an organizational structure for information. The schema below is the list of information we'd like to collect about earmarks. It's the standard for earmark reporting we propose, to reveal for both earmark requests and approved earmarks:

In other words, it tracks everything transparency advocates would want to know about an earmark. With this schema in hand, you have a standard way of recording the earmarks in a bill. Since the earmarks are described in a standard way, it becomes easy to feed those earmarks into a database, where in can be displayed, searched, and reported on.


Abstract Data Models

This is an abstract description of earmark requests, earmarks, and related entities. It purposefully avoids explicit typing of properties, aggressive normalization of entities, and other concerns which are specific to a schema implementation or to a serialization format.

This abstract data model consists of three basic elements:

The presence of a property on an individual entity is an assertion that that entity has a property with the indicated value or values. The absence of a property on an individual entity is an assertion that the value of that property is not known to whatever is supplying the entity. Properties do not have implicit or default values.

Here the basic entities are described and their properties listed and described. These properties are not intended to be exhaustive.

Earmark Requests / Earmarks / Citation / Earmark Requesters / Beneficiaries


Earmark Requests

An earmark request is any communication from a senator or member of Congress to a congressional committee requesting.legislative provisions that set aside funds for a specific program, project, activity, institution, or location. These measures normally circumvent merit-based or competitive allocation processes and appear in spending, authorization, tax, and tariff bills.

An earmark request is uniquely identified by its earmarkrequestid taken together with its fiscalyear.

The properties of an earmark request entity are:

Earmarks

An earmark is a legislative provision that sets aside funds for a specific program, project, activity, institution, or location. Earmarks may be included in appropriations or authorizations bills. (See citation below)

An earmark request does not always become an earmark, but all earmarks should be associated with an earmark request. If an earmark is not associated with a request, that will be either an undisclosed earmark or a presidentially requested earmark. An earmark will have many properties similar to an earmark request, but these properties are not redundant since they belong to an earmark as included in legislation, rather than as requested.

The properties of an earmark:

Citation

A citation is an entity describing in an unambiguous way the legislation or legislative document containing the earmark. This entity must contain all information necessary for a member of the public to locate the document and the part of the document that includes the earmark. As far as possible, this entity should have format properties or include additional properties to facilitate machine-processing so that these citations can be easily identified and cross-referenced.

Earmark Requesters

An earmark requester is a senator or representative who submits a particular earmark request or earmark.

Note that if a govtrackpersonid property is not present, there is no guarantee that the remaining properties will be sufficient to uniquely identify the requester.

Aside from the govtrackpersonid property, all properties should be interpreted as applicable to that person at the time indicated by the date property of the earmark request.

Beneficiaries

A beneficiary is an organization to receive funds as proposed in an earmark request or as directed in an earmark.

If both the duns property and the additional optional properties are present but the entry in the DUNS database contains information which does not correspond to the values of the optional properties, this specification does not define which set of information is more authoritative.

Schemas and Serializations

This document defines machine-usable schemas and serializations of the Earmark data model detailed in section one.

Only one schema and serialization is currently defined.

XML Schema

There are two types of XML document types: documents which list Earmark Requests, and those which list Earmarks. They may be distinguished by the root element earmarkrequests or earmarks.

All elements in this serialization occupy the namespace http://earmarkdata.org/schemas/earmark.

General Principles

Element and attribute names match the properties of the abstract data model as closely as possible.

Elements may represent (1) entities, (2) properties, (3) collections of entities, or (4) entity-properties (properties whose value is zero or one or more entities):

  1. Elements which represent entities are named after the entity type they represent. Properties that uniquely identify the entity are attributes of the element that represents it. All other properties of the entity are represented as child elements (see 2 and 4 below).
  2. Elements which represent properties whose value is not an entity will have the name of the property as an element name and contain the value of the property as child text of that element. These properties never have attributes. The order of these elements among their sibling elements is not significant.
  3. Elements which represent collections of entities and do not represent any property are merely containers for their child elements (which represent entities). No order among the entities is implied. There are only two such elements—the root elements earmarkrequests and earmarks.
  4. Elements which represent entity-properties have the name of the property and an Entity element as their parent (as in 2 above) but are otherwise structured identically to Entities (see 1 above) in that they use attributes to represent uniquely-identifying properties and child elements to represent properties of the entity.

Where a property or entity-property may contain a set of values, each value in the set is represented by a separate element whose parent is the element representing the entity.

Sets of values for a single property do not have a containing element so as to simplify machine processing by removing exceptions which require knowledge of entity-properties’ cardinality. However, a document is invalid if it contains multiple property-representing elements for a single entity and that property cannot have a set of values. For example, if a document contains an earmark element with multiple amount child elements, that document is invalid since the amount property can only accept one value, not a set of values.

There is a semantic difference between an empty element or attribute and a missing element or attribute. Where an element or attribute is missing, the document is not asserting anything about the property or entities that element or attribute represents—the document has no knowledge of the value of that property. Where an element or attribute is empty, the document is asserting that the property or entity represented by that element or attribute has no value. Where a property accepts a set of values, an empty set is represented by a single element with no child nodes.

Additional properties not defined in this schema or this data model may be added as sibling elements to existing entity-representing elements so long as they are added in such a way as to conform to element types 1, 2, or 4 as described above. These additional properties must use elements and attributes occupying a different xml namespace.

All properties which are required in the data model must be represented in an xml serialization for that xml document to be considered valid. All other properties may be omitted from an xml document without prejudice to its validity.

Additional simple data types

Certain properties must have a machine-readable value, the format of which is specified here.

Relax NG schema

[TODO. Should also specify simple types, so XSchema may be necessary for those as well.]

Example Serializations

Below is an example document listing Earmark Requests:

<?xml version="1.0" encoding="UTF-8" ?>
   <!--
        Sample Earmark Request XML document
    -->
   <earmarkrequests xmlns="http://earmarkdata.org/schemas/earmark">
       <earmarkrequest fiscalyear="2011" requestid="12345">
           <projectname>Earmark XML serialization</projectname>
           <amount>12345</amount>
           <description>Create an XML serialization schema for Earmark requests and Earmarks.</description>
           <date>2010-02-06</date>
           <source>http://example.org/uri/to/complete/request/letter.pdf</source>
           <earmarkrequester govtrackpersonid="54321" />
           <earmarkrequester>
               <type>S</type>
               <state>NY</state>
               <districtorclass>1</districtorclass>
               <firstname>John</firstname>
               <lastname>Doe</lastname>
           </earmarkrequester>
           <beneficiary duns="123456789">
               <name>Example Corp</name>
               <address>1234 Example Ln</address>
               <city>Washington</city>
               <state>DC</state>
               <postalcode>20017</postalcode>
               <country>US</country>
           </beneficiary>
       </earmarkrequest>
   </earmarkrequests>

Below is an example document listing Earmarks:


   <?xml version="1.0" encoding="UTF-8" ?>
   <!--
        Sample Earmark XML document
    -->
   <earmarks xmlns="http://earmarkdata.org/schemas/earmark">
       <earmark earmarkid="12345">
           <type>appropriation</type>
           <earmarkrequest fiscalyear="2011" earmarkrequestid="president"/>
           <earmarkrequest fiscalyear="2011" earmarkrequestid="12345"/>
           <!-- OR norequesterreason>No requester given</norequesterreason -->
           <fiscalyear>2010</fiscalyear>
           <projectname>Earmark XML serialization</projectname>
           <amount>12345</amount>
           <description>Create an XML serialization schema for Earmark requests and Earmarks.</description>
           <source>http://example.org/uri/to/complete/request/letter.pdf</source>
           <beneficiary duns="987654321"/>
           <beneficiary duns="123456789">
               <address>1234 Example Ln</address>
               <city>Washington</city>
               <state>DC</state>
               <postalcode>20017</postalcode>
               <country>US</country>
           </beneficiary>
           <citation>
               <earmarkbill>111-H-4321</earmarkbill>
               <earmarkreport>J. Rept. 111-212</earmarkreport>
               <location>Section D</location>
               <excerpt>Funds are to be allocated to create an Earmark Data Model and Specification.</excerpt>
               <link>http://example.org/bills/111/h4321.html</link>
               <link>http://example.com/111/HR/4321.xml#section_d</link>
           </citation>
       </earmark>
   </earmarks>