Tuesday 4 November 2008

Electronic Data Interchange Needs Electronic Paper

There is probably a word for it. You know, when you are struggling to express an idea and try several times without being satisfied that you are getting your point across. Or indeed not sure what your point is. Then someone else hits the nail on the head. Then you feel the ping of recognition, the swell of confirmed pride and the annoying jealousy of "I wish I had said that".

I want EDI to be easier. Correction, I want EDI to be easy.

In a previous article I was poking fun at the format explosion and buzz word bingo that goes on in the EDI world. The idea that standards are good so we need more of them. The dis-ing of everything that has gone before as "legacy". The constant re-invention of the wheel without leaning the lessons of history.

As a throw-away remark I said in a future article I would announce a solution based on "Facsimile Technology". I knew what I was referring to but up until now I hadn't been able to express it clearly. Then I read it else where.

At The Register, Chris Mellor has an Article entitled "The latest EDI money saver? Paper invoices - Use humans, save money".

I suspect the title is being deliberately provocative. He describes a system where a Scanner is used with OCR to create "electronic copies" of invoices that are machine readable. This is not new and there are other scanner/OCR solution providers out there. If you can't get all your suppliers to send EDI invoices, it is a great way to deal with paper invoices.

The people selling this system explain their thinking...

EDI leaves IT departments perpetually recoding (changing standards, used in differing ways, new formats, different communication protocols).
No all-embracing EDI standard is going to emerge.
The number of paper invoices still being produced after all these years of EDI effort - is massive.

... so if EDI is a failure, accept it and learn to deal with the paper.


I don't want EDI to be a failure, yet I can't believe in a "paperless office". So lets study what paper has going for it.

Paper is ubiquitous, cheap and can be put to many uses.
It can be supplied by many sources and what differences exist (size, weight, color) don't cause tie in.
Organisations are already adapted to handle it.
Humans interface with it easily.


If we look at it the other way, as a sender of invoices. If we accept that some receivers will simply convert (map) the files to paper. Is there an electronic form that EDI enabled receivers can use, yet is just as good as paper for those that aren't enabled? To come close I believe the receiving user should be able to view a list of invoices in a computer folder, like any shared folder. A double click should instantly display the same image as the old paper format. Click print, and it prints. In the folder view, highlight several, Left click, select print, and several print.

HTML & PDF files do this. XML with style sheets come close. ODF & OOXML are related to XML. DOC is closed secret and proprietary.

The file types that don't do this are X12, Edifact, Eancom, Tradacom, Odette (sorry traditional EDI).

The file format is only part of the story. The delivery mechanism needs to be as universal as snail mail. Is there an electronic protocol that EDI enabled receivers can use, yet is just as good as snail mail for those that aren't enabled?

Email is close to this. However it is a bit like using post cards with no certainty of delivery.


In my view EDI needs to be so simple and fault tolerant, that it can be done with a laptop office productivity software. That may mean we have to make changes to our office productivity software. But we definitely need to make changes to our EDI.

Friday 15 August 2008

Freedom !

I have just read Craig's post on EDITalk and was motivated to leave a long comment. So I have decided to re-post it here.

Quoting Braveheart is like a red rag to a bull to me. I have got to respond.

I am going to go on a bit about history here, but bear with me. I will come back to EDI (Electronic Data Interchange) at the end.

For the average Joe, the freedom Wallace (and Mel Gibson) was fighting for was to be lorded over by a Scotsman (Bruce) rather than an Englishman (Edward). In both cases Joe remained a peasant/serf with no right to land ownership, an obligation to pay taxes (without representation) and with little hope of a evenly applied "rule of law". Is the race of the King significant?

The attempts by various English Kings of this period (1200-1350 ?) to acquire Scotland was a MAJOR historical cock up that delayed the creation of a unified nation (Britain) by hundreds of years. For all the historical inaccuracies of the Braveheart film, one point it did correctly portray was that the Scottish ruling elite owned significant landholdings in England.

Inter-marriage and cross-cultural exchanges was bringing both countries together. If things had been left alone and there hadn’t been the many wars, unification might have happened anyway by joint inheritance. This eventually happened in 1603 when Elizabeth I of England was succeeded by James VI of Scotland (her fathers, sisters, great-grandson - or first cousin, twice removed - I think!)

Even then, things were not settled. The last big rebellion happened in 1745 when "Bonnie Prince Charlie" was defeated at the battle of Colloden. This event holds an emotional place in the culture of Scots and is seen by many as the "death of nationhood". However in this battle there were more Scots fighting on the "English" side than English. The alternative view is that this event marks the end of "tribalism" on this island and the rise of the modern Nation State.

Back to EDI.

From this perspective the freedom to choose a format looks like a freedom to choose to be enslaved by a ruling elite.

The freedom to depart from the standard is the freedom to not to be bothered with the law - that is for little people (or suppliers).

The freedom to reject the established standards and formats completely and develop your own is like sailing off to the New World in hope of a better life. You know it is not going to be a bed of roses but you think it is the future.

So how do we avoid repeating the mistakes of history? I think it is important understand what is important and makes a difference to our lives.

A N other encoding format is no better than one we already support. The new one might be easier to utilise but is requires effort to get to there from here. Using what we have is easier.

A single dominating format that was accepted by all, would produce future savings and improvements. This would be disruptive to the status quo. So to succeed it would need additional benefits beyond simple convergence, or we are in for a long wait.

Monday 11 August 2008

CSVML - Accept No Compromise

Can't decide between XML and CSV? I have the answer, and it is not JSON as I previously thought.

I have seen the light.

The answer is here. Hilarious! (well it is hilarious if you are a geek).

In a future post I shall show how the future of EDI (Electronic Data Interchange) is a merger of Edifact, X12 & Tradacom by using Facsimile technology...

Sunday 20 July 2008

Alternative EDI Formats Part II – JSON & Protocol Buffers

In the previous post I wrote how a large amount of EDI (that is Electronic Data Interchangein the widest sense) is done, without using a strict formalised standard, using CSV formats. Now Google has released details of how they execute server-to-server/program-to-program message interchange using Protocol Buffers. You won’t see the term EDI any where on Google but then the term doesn’t have a sexy web 2.0 image.

Google rejected the use of XML. I am all for that. To be fair, I think this is more to do with the desire for a binary format for super fast, supper scalable encoding and decoding. Inter-company EDI is universally text based. I can’t see that changing.

The first thing I noticed about the .proto files is their similarity to JSON. Their use seems to have pre-dated the popularisation of JSON. In other areas I have seen Google use YAML for similar definition purposes.

The .proto files are not message files. They are not sent as part of a message, ever. They are used to automatically compile programs to handle messages in the format defined by these files.

Now this struck me, because I think this is one area where CSV beats traditional EDI standards. That first row, of column headings, is like the file definition. If a trading partner adds new columns (or removes columns, or moves columns) the next time he sends the same type of message, it doesn’t matter. We don’t need to agree beforehand. The reciever can identify which cell is which piece of information by locating the column heading position.

Stripping the .proto example down to equate it with our first simplified JSON message data from the previous post, we get the following,

[[‘Jodie Foster’,1,’jfoster@silence.com’,’555-1234’],
[‘Sigourney Weaver’,2,’sweaver@alien.org’,’555-9876’],
[‘Drew Barrymore’,3,’dbarrymore@angel.net’,’555-2468’]]

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
repeated PhoneNumber phone = 4;
}


The field list is in a [Modifier – Type – Field Name – Sequence] format. Modifier and Type wouldn’t make much sense in JSON which is not restrictive in its type usage. Incorporating the sequence number into our JSON definition section gives us a useful ability.

{‘definition’:{‘name’:0,’id’:1,’email’:2}}

MessageObject.definition[‘name’] returns 0
Or,
MessageObject.definition.name returns 0

MessageObject.data[0][MessageObject.definition[‘name’]] returns Jodie Foster

Now we have the same ability to cope with our trading partners adding, moving and removing fields without the format losing its meaning.

<aside> Did you notice Google started numbering at 1 and not 0? What is that about? That is Muggle thinking! </aside>

What happens when we expand the pone field into a sub-table like before? On its own this sub-table would have a definition of,

{‘phonenumber’:0,’type’:1}

but we can't just slot this in and replace the existing phone field definition becuase we would lose the positional data. What Protocol Buffers does is list the definitions separately.

{‘definition’:{‘person’:{‘name’:0,’id’:1,’email’:2,’phone’:3},
'phone':{’phonenumber’:0,'type':1}},
‘data’:[[‘Jodie Foster’,1,’jfoster@silence.com’,
[[’555-1234’,'home'],
['555-777','mobile'],
['555-1235','fax']]],
[‘Sigourney Weaver’,2,’sweaver@alien.org’,
[[’555-9876’,'home'],
['555-0101','office']]],
[‘Drew Barrymore’,3,’dbarrymore@angel.net’,
[[’555-2468’,'home']]]]}

a=MessageObject.definition['person']['phone']
b=MessageObject.definition['phone']['phonenumber']
c=MessageObject.definition['phone']['phonetype']
MessageObject.data[0][a][2][b]
returns 555-1235
MessageObject.data[0][a][2][c]
returns fax

In this way the sender can omit any fields they like and the field sequence is no longer important. The receiver can still parse the message and extract the data segments. The message file size is kept to a minimum. returns

This is not JSONML (althogh that is intresting in it's own right) . This is about efficiently transporting a (potentially large) list of data objects of the same type.

Thursday 17 July 2008

Alternative EDI Formats Part I – CSV & JSON

I have been meaning to make this post for a long time, then Google came along with Protocol Buffers and the world moves on. So in this post I am going to outline how CSV files are used and how I thought JSON would be an improvement. In another post I will write about what I think can be learnt from Protocol Buffers.

A lot of data is communicated from machine to machine by CSV file format. It might not be strict EDI but it is electronic data interchange. It almost feels like an uncomfortable little secret no one likes to talk about (OK I admit it. I am trying to avoid the Elephant cliché).

To show what I mean, look at the number of responses to these keyword searches on Google. I know it isn't an accurate measure (compare Tradacom with Tradacom & EDI !?!?) but this is just for indicative purposes.













KeywordsNumber of Google Links
X12 46,400,000
X12 & EDI 295,000
EDIFACT 802,000
EDIFACT & EDI 241,000
Tradacom 4,410
Tradacom & EDI 5,350
XML 650,000,000
XML & EDI 451,000
JSON 8,680,000
JSON & EDI 21,000
CSV 52,400,000
CSV & EDI 1,040,000


Note that CSV out ranks all the other terms when combined with EDI. It even out ranks the unqualified EDIFACT search - the ‘UN’ standard for EDI.

Why? Well CSV is easy. It is human readable. It can be output from spreadsheet programs. Most of all, the columns and rows closely resemble the way data is stored in RDBMS tables which is the destination of most EDI data.

Taking inspiration from Google’s Protocol Buffer example, an address book could be represented as follows…

name,id,email,phone
Jodie Foster,1,jfoster@silence.com,555-1234
Sigourney Weaver,2,sweaver@alien.org,555-9876
Drew Barrymore,3,dbarrymore@angel.net,555-2468

All the programmer needs is a ‘splitting’ function to slice the file up, first by carriage returns, then by commas. In JSON format this same data may be represented as follows…

[{‘name‘:’Jodie Foster’,’id’:1,’email’:’jfoster@silence.com’,’phone’:’555-1234’},
{‘name‘:’Sigourney Weaver’, ‘id’:2, ‘email’:’sweaver@alien.org’, ‘phone’:’555-9876’},
{'name‘:’Drew Barrymore’, ‘id’:3, ‘email’:’dbarrymore@angel.net’, ‘phone’:’555-2468’}]

MessageObject[0].name returns Jodie Foster

However the file size has just ballooned. To overcome this, it could be represented in JSON another way to produce a much smaller file…

{‘definition’:[‘name’,’id’,’email’,’phone’],
‘data’:[[‘Jodie Foster’,1,’jfoster@silence.com’,’555-1234’],
[‘Sigourney Weaver’,2,’sweaver@alien.org’,’555-9876’],
[‘Drew Barrymore’,3,’dbarrymore@angel.net’,’555-2468’]]}

MessageObject.definition[0] returns name
MessageObject.data[0][0] returns Jodie Foster

Now suppose Ms Foster is good enough to give us her mobile and fax number in addition. The ‘phone’ field becomes a list. For the CSV file, another delimiter is needed.

name,id,email,phone
Jodie Foster,1,jfoster@silence.com,555-1234/555-777/555-1235
Sigourney Weaver,2,sweaver@alien.org,555-9876
Drew Barrymore,3,dbarrymore@angel.net,555-2468

But what if we want to hold phone number type as well (home, mobile, office, fax etc.)? We have 3 options…
1. add another field, also sub-delimited, where the sequencing matches the other field. 555-1234/555-777/555-1235,home/mobile/fax
2. turn the ‘phone’ field into a compound field. 555-1234]home/555-777]mobile/555-1235,home. The column heading becomes phone/type.
3. create a separate table for the fields. Rows in this new table need a unique identifier to rows in the original table.

At this point the CSV format is beginning to creek. Beyond 1 nested table, options 1 & 2 will require ever more different delimiters. So let us concentrate on option 3. In isolation this new sub-table would look like this,

id,phone,type
1,555-1234,home
1,555-777,mobile
1,555-1235,fax
2,555-9876,home
2,555-0101,office
3,555-2468,home

These files can be sent separately. If they are to be combined into 1 message then we need to indicate in some way what table each row is part of. Typically this is done by reserving the first column. In this example it could contain phoneheader-definition, phoneheader-data, phonedetail-definition, phonedetail-data.

How would we represent this in our JSON format?

{‘definition’:[‘name’,’id’,’email’,[’phone’,'type']],
‘data’:[[‘Jodie Foster’,1,’jfoster@silence.com’,
[[’555-1234’,'home'],
['555-777','mobile'],
['555-1235','fax']]],
[‘Sigourney Weaver’,2,’sweaver@alien.org’,
[[’555-9876’,'home'],
['555-0101','office']]],
[‘Drew Barrymore’,3,’dbarrymore@angel.net’,
[[’555-2468’,'home']]]]}

MessageObject.definition[3][0] returns phone
MessageObject.data[0][3][2][0] returns 555-1235
MessageObject.data[0][3][2][1] returns fax

While this encodes and represents the same message, is it better than CSV?
It is more extendable, it is slightly bigger, it is probably equally as human readable, and probably equally as machine readable. I already thought JSON was a good candidate for being the next CSV for EDI. In the next post I will write about how taking inspiration from Google’s Protocol Buffers, I think it can be improved further.

Wednesday 9 July 2008

Green Coffee XML

I am not kidding (pdf). Some might think this is great. Some might think is shows how wonderful XML is. I don't. To me it represents a lot of what is mixed up about EDI (Electronic Data Interchange). I want to make 2 points...



What is so special about Green Coffee that it needs it's own schema?

  • Well reading the docs it seems coffee dealers are a bit fussy about defining when ownership of the product and ownership of the risk (associated with product delivery) is transferred. So they have 9 different order types.
  • As well as the buyer and seller, they need to be precise about the Broker and the Shipper.
  • The quality of the product is defined by a standard and is reflected in the product codes.
  • Pricing can be by formula.
  • Unit of measure is usually Kgs but when it comes to weighing coffee it seems to be important who weighs, when, and who pays for the weighing. I count 8 weighing types.
  • The journey coffee makes can be long and the value of the coffee at different stages changes so it seems the "place of tender" is important. A simple "delivery date" is not precise enough and must be qualified.

Phew! Complicated. But excuse me. Is any one of these points unique to coffee? Maybe the combination is unique. Maybe it is more sophisticated than Acme retail EDI. But what does it gain us to reject all that has gone before in 60 years of EDI and create new EDI ghettos ?


I hope they didn't. I hope they just defined some extra tags and specified some extra attribute values, and added them on to some existing, already utilised and proven XML order standard. Which brings me to my next point.


How (for the love of coffee!) can I implement this?


I went in search of the technical details. The PDF document listed 4 XML Appendices on the contents page. They seem to be missing from the web. I went to root URL and clicked around. I couldn't even find my way back to the document. I used Google to search the site - zilch. I used Google to search the web for "Green Coffee XML", no luck.

How can you expect a schema to be used if you wont tell anybody the details? If you want it to succeed make it freely available! Have you not heard of Peer Review?

Saturday 28 June 2008

Is this what hm.gov.uk thinks is EDI? - Revisited

2 CDs with 7 million names addresses of children and parents, some with bank details, are put in the post and go missing. When I heard about this story I worried about the little guy. Well now the UK Independent Police Complaints Commission have investigated and released their report. It is well worth a read (pdf). All 61 pages and 282 paragraphs.

At the time the Minister was quick to publicly blame a "junior official" not following the "rules".

But what if there are no rules? Or too many rules? Or rules that constantly change? Or no one responsible? Or responsibility shared by too many senior people? Or if everyone is responsible it means no one is responsible?

It looks to me like the little guy was trying his best. See paragraph 130

He forwarded this email to Employee J in IMS and asked him to provide the 12 records as requested. Employee F included the following explanation in his email to Employee J:
…All we wanted was for NAO to realise exactly what they were
asking for, i.e. the scan data is live records of seven million Chb
customers when they only want to look at a dozen cases from
the scan. More importantly we needed to get the assurance of
how they would securely handle the discs containing the data
and how they would dispose of them once they had completed
the checking.
Obviously NAO should automatically realise this confidential
data has to be protected and no doubt they would do so.
However we needed something more than a verbal request to
ensure we had the paperwork to back up the request, things do
get mislaid and imagine the uproar if the discs containing the
ChB customer data went astray and turned up where they
shouldn’t – the long knives would be out. At least we would be
covering ourselves by getting the right assurance.



Wednesday 2 April 2008

Why isn't EDI easier? Part I - One Standard to Rule Them All

EDI (Electronic Data Interchange) really should be less difficult than it is. If I am a start-up company and I want to purchase timber, metal, paper, widgets or some other commodity item, EDI is just too hard.

The overhead on establishing a relationship, agreeing a standard, and testing communications is huge. EDI is supposed to reduce cost but if it results in a supplier tie-in then it will produce unwelcome influences.

If it takes a department of expensive-to-employ / prickerly-to-manage IT geeks to support, then this "cost saving" thing called EDI, just got too expensive.

The first thing that springs to most peoples mind is Standards. How may ways do we need to represent a Purchase Order? Why are there many Standards? Wouldn't it be easier to just all agree on one standard? The answer to the last question is, not necessarily.

To explain this, consider the classic example of Purchase Order Delivery date. On a multi-line purchase order, a customer will probably want all the items delivering together. But while some customers will send a delivery date in the order header section of the message, some will send delivery dates in the order line sections. Some will send both. Some will have differing dates in the detail line section. Some customers will omit dates altogether indicating the goods are required ASAP.

Don't stop me, I'm on a roll...

Sometimes it will be appropriate to send an Earliest and a Latest date range. This could be in either section. Given a free text field some will write 9/11/01, some will write 11/9/2001 and some computers will generate 20010911. By the way, not everyone in the world agrees what year it is or how many months there are.

At the other extreme, timing might be important. For example when ordering services like an aircraft flight time, or an insurance period start and finish. This brings in the question of time zones and daylight saving adjustments.

  • So if a Standard is to be universal, it has to be large and complex.


  • But if it is large and complex, it won't be easy to implement.


  • And your next partner will use the Standard in a way that is at least slightly different to all your others.
When you read "X12 is popular in North America, EDIFACT in Europe, TRADACOM in Britain" remember, just agreeing one Standard doesn't solve anything. One Standard used in two different ways is like two Standards. So the incentive to converge isn't there.

I have been watching the ODF v OOXML Standards dust up with interest. The difference with EDI is that an EDI Standard has many thousands of individual implementations. If we accept Standards overlapping as much as ODF/OOXML damage each other. If interpretation of the specification will lead to differing implementations, which will lead to interchange problems between office applications. Then we shouldn't be surprised if EDI is in trouble.

I think this is the real source of the excitement over XML. It gives structure to data even without a standard. If we accept that even with a standard, there is a need for a "mapping" function, then maybe we should aim to make this as easy as possible.

Monday 31 March 2008

When did XML become a good Idea?

It is so easy to list what is wrong with XML for EDI (Electronic Data Interchange).

  • It produces large files
  • It is very processor intensive to parse
  • Needs another new standard
  • Or can be used/abused without a standard
  • It isn't even very human readable (despite what some say)
And yet... a lot of people find it very compelling.

To understand why, imagine you are talking to a developer who knows nothing about EDI. Show them an EbizXML file, an EDIFACT file, a Tradacom file and a X12 file. All for the same document. The chances are they will immediately be able to tell you the first file is XML as they know what that is. They will probably even be able to begin to translate its contents.

Explain to this developer you want a subroutine / function / method, to extract and return certain elements from the document. They will not blink as they reach to the keyboard and fire up their perfered language tools. They will have a first draft ready in less time than it will take to explain the mere structure of any of the other files.

Any non-developer who has just read this is now scatching their head wondering why file structures would have to be explained. They are also wondering why the problems listed at the beginning are... problems?