Wednesday, 11 January 2012

Green Coffee XML - Revisited

I wrote before about my frustration at EDI Standards Proliferation. I also said if there was anything more fustrating than EDI Standards Proliferation, it was EDI Standards Secrecy.

Well done to the Green Coffee Organisation. I can now find reference material for Green Coffee XML on the internet for free. This link leads you to a 124 page PDF document, Contract Terms & Conditions. There listed in appendix A-D is the specification.

[It is there at time of writing. Whether it is moved after the next site redesign is another issue.
So long as it is findable by the major search engines.]

I can't make head or tail of it though (excuse the pun). No worked examples. While I am not an expert on XML, I believe most definitions use DTD or XML Schema. This does not look like either. But still, that is there right and freedom.

The licence is "...royalty-free, worldwide..." and covers copies "...all or part... any form..." provided the copyright notice is retainned and patent truce is maintained.

Well done Green Coffee Org.

Thursday, 5 January 2012

The Limits of the Catalogue

I have a problem. I don't normally like to talk about it, but you may share the same problem and if we get it out in the open, then maybe we can all cope with it better. The problem with this problem, is it doesn't have an obvious name. So while no one likes to be labeled, without a label it is hard to communicate easily. How to describe it? Where to begin?

The problem stems from the nature of the product we supply and how it is handled by EDI (Electronic Data Interchange). These widgets come in different sizes. There are industry standard sizes. 10 different widths, 15 different heights, 5 different depths.

So that is 10 x 15 x 5 = 750 different sizes.

Then there are the colors. We bring out new ones each year, some colors are retired and some popular classics are always going to be available. Say about 25 colors are current with about 5 changing each year.

750 x 25 = 18,750

Don't think these are just boring old boxes. We have several 'styles' to choose from. Contemporary, Classic, Gothic, Art deco. The list goes on. Like colors, say 15 styles with about 3 changing each year.

18,750 x 15 = 281,250

As an available option we can supply a low power economy version, standard, high power or "Max Power" business version.

281,250 x 4 = 1,125,000

They can be with or without an extra adapter port. With or without the toughened, rubberized embedded protection. With or without an "easy-grip" handle.

1,125,000 x 2 x 2 x 2 = 90,000,000

That is 90 million different products! I will stop there but actually there is more. We don't just do widgets. We do widget fittings and widget accessories. We deal with products that are alternatives to widgets and complementary to widgets. We also do bespoke "made-to-measure" widgets.

Now here is the killer contradiction. By volume and by value, 80% of everything we supply is covered by 2000-3000 products, so we give them individual product codes. The catalogue is more than 50 pages. However this 80% of products only gives us 20% of our profits. The bulk of our earnings come from the 20% "non-standard" product.

Now I am ready to confront my problem, "My name is EDI Eddy and my catalogue has been living a lie!".

Most EDI assumes ordering is done by product code reference. This means that for a customer, with a computerised (and EDI enabled) purchasing system, to order from us, they must set up a database with millions of entries. Even if this was achievable, problems still remain.

  • How does a purchaser find the correct product code in their database?
  • If it is difficult to order electronically, will they order manually or order from some one else?
  • When our range of products change regularly, how do thousands of database entries get added and deleted?

So the reality is our EDI is limited to the low margin, less valued end of our product range. Over the years we have been constantly expanding EDI, handling higher message volumes, supporting more message formats and more delivery methods. The reality is EDI has been getting less and less important :(

Wednesday, 21 April 2010

It is an Ill wind that blows nobody any good - from Berlin to Haiti

For as long as I can remember I have been reading the phrase "EDI has its beginnings in the Berlin Airlift in 1948". The next bit of history usually jumps to the 1960's American Transport industry. I read this again recently and for once it got me wondering. What electronic communication was there in 1948? What message was it and in what format, and how was it delivered? How was it developed and agreed? What was it about the Berlin Airlift that needed an EDI message?

The details took quite a bit of tracking down (I have included some links below), but the best source I could find for this bit of history is one Maj. Edward Guilbert who was a traffic manager in Berlin. On leaving the army he went to work... the transport industry.

To answer the questions above,

What electronic communication was there in 1948? Teletype.

What message was it, in what format, and how was it delivered? Manifest or Advanced Ship Note. No examples are believed to survive. It was radioed by teletype to Berlin from the planes departing airport, as soon as possible after the planes wheels left the ground.

How was it developed and agreed? You're in the Army now. You do as you are told. I don't know if the supplying airports and plane companies adopted the same procedures for their other business, but I doubt it. But it made clear what was possible.

What was it about the Berlin Airlift that needed an EDI message? In a word "Bottleneck" Every last kilo of supplies was vital but it couldn’t just sit on the ground at the Berlin airports. At the rate of a plane every few minutes, parking space for planes would soon disappear. Even if planes unloaded and departed, hangers would soon fill up. The goods needed to be moved on which meant the right people needed to be ready to receive the right goods.

There is a wonderful story about how they tried to stop the incoming pilots disappearing for a hot drink and a snack, by supplying mobile refreshment manned by pretty young frauleins, all to shave valuable minutes off the turnaround time.

It struck me that what was game changing about this was not Electronics or Message Formats. It wasn’t about Cost Saving. When you are fighting a Cold War, cost is no object. It seems to me that what made all the difference was Standardization. Different parties, all did the same thing, in the same way, to speed up processes that would otherwise jam up and slow to a crawl.
The Berlin Airlift would make a good motion picture. The struggle to overcome adversity through perseverance and ingenuity. How the little guys succeeded by teamwork and coordination. I can see it now, “EDI – The Movie”. Well OK, maybe not.

When listening to news reports of the Haiti earthquake relief efforts, I was struck at the similarities in the problems they experienced at the airport. It seems there is nothing new under the sun.

Computerworld Article - 1910 Telegraphs to XML

ECommerce Google Books - The Maj. Edward Guilbert Story

Ecommerce Connexion Artitcle - A short history summary of the Berlin Airlift

Wednesday, 25 November 2009

When do Standards become Standards ?

I want to promote good sense when I read it. Adam Bosworth has a very good article on standards. He is not talking about EDI standards, but technical and software standards in general. Yet everything he says I feel is applicable to EDI.

In summary, standards should be simple, readable, focused, precise, implemented, forgiving and free.

Adam's standards background is ODBC, AJAX and XML. These technologies are used so computer can talk to computer, servers and clients can talk to clients and servers, so Electronic Data can be... Interchanged.

Monday, 23 November 2009

Creating an FTP Script

In the last post I discussed using a web FTP service as an alternative EDI service, so I thourght I should also provide some help with FTP scripts. You can use a graphical tool like windows explorer for example by typing in the address box. However to eliminate the work of the human operator as far as possible, scripts are needed.

That said the best first step to achieve this is to make sure you can do everything manually first. So drop to the command line. On a windows system click Start > Run, and type CMD. Now try and replicate something like the session below, the local machine prompts are in red, the ftp server responces are in blue, inputs in black.

C:\Documents and Settings\ACME>
C:\Documents and Settings\ACME>ftp
Connected to
220 ready.
User: ACME
331 Password required for ACME.
230 User ACME logged in.

ftp> cd /outbox
250 CWD command successful.
ftp> put acme-bbunny-321.txt acme-bbunny-321.tmp
200 PORT command successful.
150 Opening ASCII mode data connection for put acme-bbunny-321.txt.
226 Transfer complete.
ftp> rename acme-bbunny-321.tmp acme-bbunny-321.txt
350 File exists, ready for destination name.
250 RNTO command successful.

ftp> cd /inbox
250 CWD command successful.
ftp> ls
200 PORT command successful.
150 Opening ASCII mode data connection for file list.
226 Transfer complete.
ftp> get bbunny-acme-123.txt
200 PORT command successful.
150 Opening ASCII mode data connection for bbunny-acme-123.txt (999 bytes).
226 Transfer complete.
ftp> del test.tmp
250 DELE command successful.

ftp> quit
221 Goodbye.
C:\Documents and Settings\ACME>

This follows the procedure laid out in the previous post. Sending a file, listing files to be fetched and fetching them. If you can do this then you are well on your way. If not then any amount of program testing and debugging wont help you.

The next stage is some quick and dirty code just to get the job done. This example is in Python.

from ftplib import FTP

host = ''
user = 'acme'
password = 'password'
filestosend =["acme-bbunny-321.txt"]

# Log on

ftp = FTP(host)

# send

for filename in filestosend:
fileinput = open(filename, 'r')
ftp.storlines('STOR tempory.tmp',fileinput)

# fetch

filestofetch = ftp.nlst()
for filename in filestofetch:
if filename[-4:] != '.tmp':
fileoutput = open(filename, 'w').write
ftp.retrlines('RETR ' + filename,fileoutput)

Does this work? If it does then well done, but your work isn't finished. This script has no error traps, no try statements. The 4 variable assignments at the beginning need to be parametrised in some way. (Exercise for the reader)

Sunday, 15 November 2009

How to use an FTP Host as an Alternative EDI Network

Web site hosting services are very competative now. Easy to use and ever cheaper for both individuals and companies. These servers and systems can also easily provide FTP connections. Dozens of user logins and Giga bytes of storage can be had for an annual cost of less than the purchase price of a desktop computer. With the aid of some relatively simple scripts your computer system can reach out across the internet and use these as a stripped down, bargin basement, alternative EDI (Electronic Data Interchage) network. Here's how...

Create a folder for each partner. Within each folder create 2 sub-folders, inbox & outbox. See below










You could call these sub folders something else, like upload & download, or simply in & out. However using the inbox/outbox terms helps to relate them to their function and models the familiar email arrangement.

Assign each partner a user id & password and set their "home" folder. When they log on they should only be able to access their own folder.

To send you a file, the partner will log on and "put" (in FTP terms) the file in the postbox folder. To receive any files from you they will list ("ls") the mailbox and "get" anything they find then "del".

To send a file to a partner, log on and "put" the file in the partner\mailbox folder for them to collect. remember as host you start from the root folder. To receive any files from your partners, list all outbox folders and "get" anything you find then "del".

This is a simple arrangement, but there are some suttle potential gotchas.

  • All files sent and received should be have unique names. If you send a daily file (for example orders.csv) and the receiver doesn't pick it up before the next one is sent, it will get overwritten. To avoid this the sender keeps a counter updated on the generating system and the number is incorparated into the file name e.g. orders123.csv
  • If both sender and receiver are logged on at the same time then it is possible to list a file that is only partly transmitted. To avoid receiving part files, the sender should "put" using a tempory name (typically with a .tmp extension), then "rename" it to its correct name. The receiver just egnores any listed .tmp file.
  • The FTP protocol has 2 modes. ASCII & Binary. Most EDI files are text files. On Windows/Mac/Linux/Unix the end of a line (EOL) in a text file is marked by different characters. New lines, form feed etc. If sender and receiver have different EOL styles then this can cause problems. This is solved if the FTP is done in ASCII mode (which is usually the default). However for things like excel and jpeg's is would cause data curruption so binary mode must be used.
It doesn't really matter which party is the host. If you are at the client end of the relationship and have to deal with several different hosted sites, the overhead for each additional FTP connection and session is small. Address, username & password. No complicated configuration.

If Tom, Dick, Harry need to exchange files with each other, as well as yourself, then this can also be achieved. In traditional EDI the "to" & "from" envelope data is found by reading the beginning of the message. This method could be used but it would require each file to be moved to your system so it could be read and then imediately retured to the host server in the correct folder. A better way that does not restrict the files to formal EDI formats, is to require the file names to be in the form from-to-count.extn e.g. tom-dick-99.txt or harry-tom-321.xls

Monday, 3 August 2009

Embedded EDI

I have just come across the phrase Embedded EDI. I think it fits quite nicely as a label for what I have been trying to articulate.

Some highlights...

  • "Are a CEO’s email or Text Messages more or less important than an X12 Shipping notice?"
  • "Why is EDI not subject to universal addressing via DNS or URI name-spaces?"
  • "CEO’s and children expect email to reach it’s destination, but Supply chain managers need private networks. Why?"
Good questions...

Sunday, 2 August 2009

I am Eddy

Eddy (fluid dynamics): The swirling of a fluid and the reverse current created when the fluid flows past an obstacle.

Thursday, 23 July 2009

I am not Eddy

I am not angry with every idiot.

Thursday, 28 May 2009

When Lost, it helps to remember where you want to go

I have the occasional high school student pass through my office on work experience. They do the rounds going to each department. When they come to me they have already spent time in the Sales and Purchasing departments. I am expected to give them a 30 minute overview of EDI (I don't just do EDI, but no one else understands it, or wants to understand it, so that is what they ask me to do). Then they move on to the help desk where they occupy them with tasks, like prepping desktops. They always like that, but I can't claim any are enthusiastically interested in what I have to say. I find the exercise helps to keep my feet firmly on the ground and head out of the stars, as they sit there wondering (despite my best efforts) what I'm talking about.

Well, the other day it was the turn of the one of my IT colleagues offspring. Keen to make a good impression, I was spurred on to think up a new and better explanation than I had used before. I was pleased with the results so I thought I would reproduce a version of it here.


When you enter Sales Order Processing department, you will see people opening the mail and extracting customer purchase order forms. Some of them are hand written, some are computer printed.

There is also have a Fax machine dedicated to customer orders that spews out a steady stream of order forms. Again, some of them are hand written, some are copies of computer generated order forms.

We also have a team of people who take orders verbally over the phone.

Finally, some customer orders come in as emails. Usually as PDF or Microsoft Word attachments. All these orders are read and the details typed into our Computer system.

This raises 2 questions, and leads to a 3rd great big "What If" question.

Q1. Why aren't all orders emailed?

It is cheaper not to have labor deployed in the mail room, or manning the phones, or maintaining the fax line and fax machine. The cost of receiving emails is tiny in comparison. When you scale up to hundreds, thousands, tens of thousands a day - for emails, the cost graph raises far less steeply. The same is true for our customers. Most of them are companies and organisations. It is cheaper for them not to pay for postage, or phone calls.

So I repeat the question. Why aren't all orders emailed? The answer? I don't know, you should ask the customers. But the point is, whatever the real answer (and it might be different for different customers) take a note of how difficult it is to change our customers behaviour. We can't afford to turn away business by not accepting the orders and we don't want, or to make it hard to order from us.

Q2. Why do all orders have to be re-typed into our Computer system?

Every product is made up of many components and materials. All of which have to be purchased. Every product requires many different sorts of resources, people and machines. All have to be planned and instructed what to do. Controlling all these aspects takes a complicated Computer program (called an ERP system) and that is why they have to be entered.

Q3. What if, instead of having to re-type everything, you could click and drag the email attachment, drop it on the ERP icon on your desktop, and the software processed it for you?

I will tell you "what if". If you could do that you will have achieved what businesses and organisations have been trying to achieve for decades. This is the ultimate aim of EDI.

Why is this so difficult?
To a computer, text, is text, is text. How does a computer tell one piece of text is a product description and another is an address, one is a quantity and another is a price, one a delivery date another is a created date? To humans this is easy. For computers this is hard.

Ever tried to open a Microsoft Word document with Notepad? That is what the file looks like to a programmer. What do all the non-text characters mean? Only those who have signed a non-disclosure contract with Microsoft know, and none of them have permission to tell you.

First thing you have to do is agree a document format that is open to everyone. Then everyone has to agree where to place each bit of data or how to label it. In short, globally speaking, we can't agree. Many have been expecting one of the many formats to emerge as the dominant one, like the way every one uses MP3 instead of WMA or WAV, but this hasn't happened.

The next thing is product identity. We distribute a catalogue describing all our products and product options along with product codes (you see these bar-coded on the packaging). We encourage our customers to quote these in the order so that there is no miss-understanding. How do our customers (and their computer systems) know this information? If they order the same product from different suppliers, each will have their own code.

At least sending the document is easy. I mean everyone uses email, right? How else would you send electronic data from A to B? Er.. hmm... well... No. We have been moving electronic data around since before email and the Internet. There are dedicated EDI networks. There are Extranet web sites. Third party hosted web app clearing houses. There are different ways of securing emails. There are different encryption solutions. Security Certificates. Web of trust. etc. etc. etc.

So what are we to do? We could pick one solution and tell all our customers to communicate in that way. But remember these are the same customers, some of whom are still using post and fax. Remember how difficult it is to change some customers behaviour? Some of the big customers might prefer a different solution. We won't turn them away, but very quickly we find ourselves supporting many different forms of EDI. The cost of implementing another one has to be weighed against the volume of business (and cost saving) we can expect in the future.

Now turn your point of view around. When we order from our suppliers, we become the customer. The complexity just doubled. I have only talked about orders, but there are many other documents that are exchanged in business.


Are you Lost? Can you remember where we were heading? Any bright ideas?

Friday, 17 April 2009

What if EAN/UCC numbers didn't exist?

I was in the process of setting up a mapping for a new EDI (Electronic Data Interchange) customer. One of the first things I had to do was to take the senders ID from the message envelope and create a look-up link to the customers account on our ERP system. Now just lately I have become obsessed by leaner, simpler EDI. So I looked at this "simple" process with new eyes.

The senders ID, and by extension any party ID, has to be agreed to in advance. This is because it is no good if Acme Inc uses an ID of ACM001 if it is also used by The Association of Comics & Mimes. The receiver wont know which party it is. Now a human will be able to deduce who it is from if the message includes a full address and this is displayed for manual processing. But EDI is not about manual processing.

One solution is for the sender to use the receivers’ allocated ID for the sender (themselves). But they won't know this unless they communicate first.

The most common solution is to use a neutral and authoritative third party’s' allocated ID. The most commonly used are EAN/UCC numbers. However this is not very ubiquitous. Ask any big company CEO or any small one person trader, what their company’s EAN/UCC number is, you will get a blank look. Ask them what their tax number is and they would be able to give you an answer quite quickly. Tax numbers are not a universal solution as there are occasions when they would be not applicable.

But as a receiver of unsolicited messages, a new, never before seen EAN/UCC number is not much use. They are not routinely stored on our ERP customer/supplier database. I can't do a Google search to tell me who is 555987654321 or what is Acme's EAN/UCC. I am stuck with asking and potential EDI partner what their EAN/UCC number is. We are back to prior communication again.

The EDIFACT format standard, NAD segment, lists 300+ organisations that can be used to supply ID numbers. Tax offices, Banking organisations and Standard bodies dominate (EAN/UCC is number 9). None of those listed stood out to me as a better solution so I guess EAN/UCC dominates as it is the least worst.

At this point I realised something. These messages from this sender are not being delivered by a traditional EDI method and as a result our system already knows who they are from. You see they are already linked with an Internet domain name. Now this strikes me as a very good pre-existing "identity" code. It is unique. It is well known. It is in common use. For me it passes the "Fit for purpose" test and the "Ubiquitous" test. I searched the EDIFACT code list but apart from the catch all ZZZ (= Mutually Defined), none seemed applicable.

But then where in all the EDIFACT standard does it refer to a URL, a URI, or an email address? I guess the relevant parts (as they are pretty fundamental) probably date from the early history of the standard, circa 1989. I think I sent my first email on JANET around about then. (That is much earlier than anyone else I know. Whoops! I guess that ages me.)

It is twenty years later now and I think EDI has some catching up to do.

Monday, 2 February 2009

Million Dollar Traders, Nickel & Dime EDI

They don't seem to do "Documentaries" any more. It is all "Reality TV" these days, and I have mostly had enough of them. I wish shows like "The Apprentice" would show more details of the problem solving, less of the problem personalities. But such is the rarity of Business related material I gave this new show a go.

Million Dollar Traders - the premise: take some ordinary people who show some promising aptitude, train them intensively for a short period, then let them loose with real money and see how then perform as a Hedge Fund.

Unfortunately I didn't learn much about the workings of The City, Finance and Hedge Funds. I learnt more than I wanted to about ordinary people under stress.

What caught my eye was the technology on display. Each trader had 2 or 3 computer monitors showing large amounts of numbers and graphs. Real time data was so important it was considered a major faux pas not to be at the desk the moment the markets opened. They were constantly crunching numbers with calculators as they balanced (or hedged) risk trying to find a profit margin. All this was happening as the Credit Crunch Tsunami hit, and banks crumbled into chaos. After 3 weeks I think the best trader made about 1% gain, compared with an industry average of 5% loss. My rough guess is that this represents £2,000.

Throughout this time we watched as the trainee traders weighed their investment decisions, the tension and drama was built up. We sat on the edge of our seats as they finally committed themselves to make a deal....

...and then they picked up the phone...


No "Accept" button to click. No https protocol session. But a conversation with a real person. A request to buy or sell (go long or short in the jargon). A query on the price and a confirmation on the total cost. They then got out of their chair with a piece of paper (presumably some sort of record of the deal), walked across the room and got it stamped in some sort of machine that made a satisfying "kerching" sound.

At first I put this down to artistic license and the program makers wanting to make more interesting imagery. Like in the film Independence Day where, they bring down an alien war machine with a software virus. As unlikely as it is that the aliens have no anti-virus software, they leave out the incredibly quick development of the USB port to NBSA port adapter hardware (Never Before Seen Alien port).

Then in the last episode a strange thing happened. The boss was upset. He was a REAL trader. He knew how it was. The new guys were not coming up to scratch. He didn't like their attitude. It was all wrong. So they started discussing acceptable telephone manners when on the phone to a broker. At issue was the fact that some of the traders were spending too much time saying "hello", "please" and "thank you". He insisted that the Brokers will be laughing at them the minute the phone was put down. If the traders were not short and blunt they would get no respect. The brokers would be less likely to do their job as efficiently.

My jaw dropped. It was for real. The boss was prepared to openly admit a willingness to trample all in his way in the focused pursuit of money. I understood that. It was in the nature of the job. Yet he was prepared to rely on a fallible human, being able to interpret verbal instructions communicated by another fallible human. Indeed, in the first episode one guy accidentally got his "buy" and "sell" mixed up. In the 20 seconds it took to correct his mistake, he lost money.

The labour cost of those transactions was just silly. Yet the volume of trading, city wide, is huge. I know the term "EDI" is not a trendy buzzword, but where was the - Electronic - Data - Interchange?

I know the world of Finance has taken a heavy PR knock recently. I have heard the view expressed elsewhere that those in charge didn't really understand what was going on. But so long as the money kept rolling, questionable practices remained under the radar. If this TV program is representative then to me it would seem to confirm this view.

Here is an important life lesson I learnt as a small child watching The Wizard of Oz.

Despite what the Wizard says, don't ignore the man behind the curtain. His presence is telling you something!

Tuesday, 4 November 2008

Electronic Data Interchange Needs Electronic Paper

There is probably a word for it. You know, when you are struggling to express an idea and try several times without being satisfied that you are getting your point across. Or indeed not sure what your point is. Then someone else hits the nail on the head. Then you feel the ping of recognition, the swell of confirmed pride and the annoying jealousy of "I wish I had said that".

I want EDI to be easier. Correction, I want EDI to be easy.

In a previous article I was poking fun at the format explosion and buzz word bingo that goes on in the EDI world. The idea that standards are good so we need more of them. The dis-ing of everything that has gone before as "legacy". The constant re-invention of the wheel without leaning the lessons of history.

As a throw-away remark I said in a future article I would announce a solution based on "Facsimile Technology". I knew what I was referring to but up until now I hadn't been able to express it clearly. Then I read it else where.

At The Register, Chris Mellor has an Article entitled "The latest EDI money saver? Paper invoices - Use humans, save money".

I suspect the title is being deliberately provocative. He describes a system where a Scanner is used with OCR to create "electronic copies" of invoices that are machine readable. This is not new and there are other scanner/OCR solution providers out there. If you can't get all your suppliers to send EDI invoices, it is a great way to deal with paper invoices.

The people selling this system explain their thinking...

EDI leaves IT departments perpetually recoding (changing standards, used in differing ways, new formats, different communication protocols).
No all-embracing EDI standard is going to emerge.
The number of paper invoices still being produced after all these years of EDI effort - is massive.

... so if EDI is a failure, accept it and learn to deal with the paper.

I don't want EDI to be a failure, yet I can't believe in a "paperless office". So lets study what paper has going for it.

Paper is ubiquitous, cheap and can be put to many uses.
It can be supplied by many sources and what differences exist (size, weight, color) don't cause tie in.
Organisations are already adapted to handle it.
Humans interface with it easily.

If we look at it the other way, as a sender of invoices. If we accept that some receivers will simply convert (map) the files to paper. Is there an electronic form that EDI enabled receivers can use, yet is just as good as paper for those that aren't enabled? To come close I believe the receiving user should be able to view a list of invoices in a computer folder, like any shared folder. A double click should instantly display the same image as the old paper format. Click print, and it prints. In the folder view, highlight several, Left click, select print, and several print.

HTML & PDF files do this. XML with style sheets come close. ODF & OOXML are related to XML. DOC is closed secret and proprietary.

The file types that don't do this are X12, Edifact, Eancom, Tradacom, Odette (sorry traditional EDI).

The file format is only part of the story. The delivery mechanism needs to be as universal as snail mail. Is there an electronic protocol that EDI enabled receivers can use, yet is just as good as snail mail for those that aren't enabled?

Email is close to this. However it is a bit like using post cards with no certainty of delivery.

In my view EDI needs to be so simple and fault tolerant, that it can be done with a laptop office productivity software. That may mean we have to make changes to our office productivity software. But we definitely need to make changes to our EDI.

Friday, 15 August 2008

Freedom !

I have just read Craig's post on EDITalk and was motivated to leave a long comment. So I have decided to re-post it here.

Quoting Braveheart is like a red rag to a bull to me. I have got to respond.

I am going to go on a bit about history here, but bear with me. I will come back to EDI (Electronic Data Interchange) at the end.

For the average Joe, the freedom Wallace (and Mel Gibson) was fighting for was to be lorded over by a Scotsman (Bruce) rather than an Englishman (Edward). In both cases Joe remained a peasant/serf with no right to land ownership, an obligation to pay taxes (without representation) and with little hope of a evenly applied "rule of law". Is the race of the King significant?

The attempts by various English Kings of this period (1200-1350 ?) to acquire Scotland was a MAJOR historical cock up that delayed the creation of a unified nation (Britain) by hundreds of years. For all the historical inaccuracies of the Braveheart film, one point it did correctly portray was that the Scottish ruling elite owned significant landholdings in England.

Inter-marriage and cross-cultural exchanges was bringing both countries together. If things had been left alone and there hadn’t been the many wars, unification might have happened anyway by joint inheritance. This eventually happened in 1603 when Elizabeth I of England was succeeded by James VI of Scotland (her fathers, sisters, great-grandson - or first cousin, twice removed - I think!)

Even then, things were not settled. The last big rebellion happened in 1745 when "Bonnie Prince Charlie" was defeated at the battle of Colloden. This event holds an emotional place in the culture of Scots and is seen by many as the "death of nationhood". However in this battle there were more Scots fighting on the "English" side than English. The alternative view is that this event marks the end of "tribalism" on this island and the rise of the modern Nation State.

Back to EDI.

From this perspective the freedom to choose a format looks like a freedom to choose to be enslaved by a ruling elite.

The freedom to depart from the standard is the freedom to not to be bothered with the law - that is for little people (or suppliers).

The freedom to reject the established standards and formats completely and develop your own is like sailing off to the New World in hope of a better life. You know it is not going to be a bed of roses but you think it is the future.

So how do we avoid repeating the mistakes of history? I think it is important understand what is important and makes a difference to our lives.

A N other encoding format is no better than one we already support. The new one might be easier to utilise but is requires effort to get to there from here. Using what we have is easier.

A single dominating format that was accepted by all, would produce future savings and improvements. This would be disruptive to the status quo. So to succeed it would need additional benefits beyond simple convergence, or we are in for a long wait.

Monday, 11 August 2008

CSVML - Accept No Compromise

Can't decide between XML and CSV? I have the answer, and it is not JSON as I previously thought.

I have seen the light.

The answer is here. Hilarious! (well it is hilarious if you are a geek).

In a future post I shall show how the future of EDI (Electronic Data Interchange) is a merger of Edifact, X12 & Tradacom by using Facsimile technology...

Sunday, 20 July 2008

Alternative EDI Formats Part II – JSON & Protocol Buffers

In the previous post I wrote how a large amount of EDI (that is Electronic Data Interchangein the widest sense) is done, without using a strict formalised standard, using CSV formats. Now Google has released details of how they execute server-to-server/program-to-program message interchange using Protocol Buffers. You won’t see the term EDI any where on Google but then the term doesn’t have a sexy web 2.0 image.

Google rejected the use of XML. I am all for that. To be fair, I think this is more to do with the desire for a binary format for super fast, supper scalable encoding and decoding. Inter-company EDI is universally text based. I can’t see that changing.

The first thing I noticed about the .proto files is their similarity to JSON. Their use seems to have pre-dated the popularisation of JSON. In other areas I have seen Google use YAML for similar definition purposes.

The .proto files are not message files. They are not sent as part of a message, ever. They are used to automatically compile programs to handle messages in the format defined by these files.

Now this struck me, because I think this is one area where CSV beats traditional EDI standards. That first row, of column headings, is like the file definition. If a trading partner adds new columns (or removes columns, or moves columns) the next time he sends the same type of message, it doesn’t matter. We don’t need to agree beforehand. The reciever can identify which cell is which piece of information by locating the column heading position.

Stripping the .proto example down to equate it with our first simplified JSON message data from the previous post, we get the following,

[[‘Jodie Foster’,1,’’,’555-1234’],
[‘Sigourney Weaver’,2,’’,’555-9876’],
[‘Drew Barrymore’,3,’’,’555-2468’]]

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
repeated PhoneNumber phone = 4;

The field list is in a [Modifier – Type – Field Name – Sequence] format. Modifier and Type wouldn’t make much sense in JSON which is not restrictive in its type usage. Incorporating the sequence number into our JSON definition section gives us a useful ability.


MessageObject.definition[‘name’] returns 0
Or, returns 0[0][MessageObject.definition[‘name’]] returns Jodie Foster

Now we have the same ability to cope with our trading partners adding, moving and removing fields without the format losing its meaning.

<aside> Did you notice Google started numbering at 1 and not 0? What is that about? That is Muggle thinking! </aside>

What happens when we expand the pone field into a sub-table like before? On its own this sub-table would have a definition of,


but we can't just slot this in and replace the existing phone field definition becuase we would lose the positional data. What Protocol Buffers does is list the definitions separately.

‘data’:[[‘Jodie Foster’,1,’’,
[‘Sigourney Weaver’,2,’’,
[‘Drew Barrymore’,3,’’,

returns 555-1235[0][a][2][c]
returns fax

In this way the sender can omit any fields they like and the field sequence is no longer important. The receiver can still parse the message and extract the data segments. The message file size is kept to a minimum. returns

This is not JSONML (althogh that is intresting in it's own right) . This is about efficiently transporting a (potentially large) list of data objects of the same type.

Thursday, 17 July 2008

Alternative EDI Formats Part I – CSV & JSON

I have been meaning to make this post for a long time, then Google came along with Protocol Buffers and the world moves on. So in this post I am going to outline how CSV files are used and how I thought JSON would be an improvement. In another post I will write about what I think can be learnt from Protocol Buffers.

A lot of data is communicated from machine to machine by CSV file format. It might not be strict EDI but it is electronic data interchange. It almost feels like an uncomfortable little secret no one likes to talk about (OK I admit it. I am trying to avoid the Elephant cliché).

To show what I mean, look at the number of responses to these keyword searches on Google. I know it isn't an accurate measure (compare Tradacom with Tradacom & EDI !?!?) but this is just for indicative purposes.

KeywordsNumber of Google Links
X12 46,400,000
X12 & EDI 295,000
EDIFACT 802,000
EDIFACT & EDI 241,000
Tradacom 4,410
Tradacom & EDI 5,350
XML 650,000,000
XML & EDI 451,000
JSON 8,680,000
JSON & EDI 21,000
CSV 52,400,000
CSV & EDI 1,040,000

Note that CSV out ranks all the other terms when combined with EDI. It even out ranks the unqualified EDIFACT search - the ‘UN’ standard for EDI.

Why? Well CSV is easy. It is human readable. It can be output from spreadsheet programs. Most of all, the columns and rows closely resemble the way data is stored in RDBMS tables which is the destination of most EDI data.

Taking inspiration from Google’s Protocol Buffer example, an address book could be represented as follows…

Jodie Foster,1,,555-1234
Sigourney Weaver,2,,555-9876
Drew Barrymore,3,,555-2468

All the programmer needs is a ‘splitting’ function to slice the file up, first by carriage returns, then by commas. In JSON format this same data may be represented as follows…

[{‘name‘:’Jodie Foster’,’id’:1,’email’:’’,’phone’:’555-1234’},
{‘name‘:’Sigourney Weaver’, ‘id’:2, ‘email’:’’, ‘phone’:’555-9876’},
{'name‘:’Drew Barrymore’, ‘id’:3, ‘email’:’’, ‘phone’:’555-2468’}]

MessageObject[0].name returns Jodie Foster

However the file size has just ballooned. To overcome this, it could be represented in JSON another way to produce a much smaller file…

‘data’:[[‘Jodie Foster’,1,’’,’555-1234’],
[‘Sigourney Weaver’,2,’’,’555-9876’],
[‘Drew Barrymore’,3,’’,’555-2468’]]}

MessageObject.definition[0] returns name[0][0] returns Jodie Foster

Now suppose Ms Foster is good enough to give us her mobile and fax number in addition. The ‘phone’ field becomes a list. For the CSV file, another delimiter is needed.

Jodie Foster,1,,555-1234/555-777/555-1235
Sigourney Weaver,2,,555-9876
Drew Barrymore,3,,555-2468

But what if we want to hold phone number type as well (home, mobile, office, fax etc.)? We have 3 options…
1. add another field, also sub-delimited, where the sequencing matches the other field. 555-1234/555-777/555-1235,home/mobile/fax
2. turn the ‘phone’ field into a compound field. 555-1234]home/555-777]mobile/555-1235,home. The column heading becomes phone/type.
3. create a separate table for the fields. Rows in this new table need a unique identifier to rows in the original table.

At this point the CSV format is beginning to creek. Beyond 1 nested table, options 1 & 2 will require ever more different delimiters. So let us concentrate on option 3. In isolation this new sub-table would look like this,


These files can be sent separately. If they are to be combined into 1 message then we need to indicate in some way what table each row is part of. Typically this is done by reserving the first column. In this example it could contain phoneheader-definition, phoneheader-data, phonedetail-definition, phonedetail-data.

How would we represent this in our JSON format?

‘data’:[[‘Jodie Foster’,1,’’,
[‘Sigourney Weaver’,2,’’,
[‘Drew Barrymore’,3,’’,

MessageObject.definition[3][0] returns phone[0][3][2][0] returns 555-1235[0][3][2][1] returns fax

While this encodes and represents the same message, is it better than CSV?
It is more extendable, it is slightly bigger, it is probably equally as human readable, and probably equally as machine readable. I already thought JSON was a good candidate for being the next CSV for EDI. In the next post I will write about how taking inspiration from Google’s Protocol Buffers, I think it can be improved further.

Wednesday, 9 July 2008

Green Coffee XML

I am not kidding (pdf). Some might think this is great. Some might think is shows how wonderful XML is. I don't. To me it represents a lot of what is mixed up about EDI (Electronic Data Interchange). I want to make 2 points...

What is so special about Green Coffee that it needs it's own schema?

  • Well reading the docs it seems coffee dealers are a bit fussy about defining when ownership of the product and ownership of the risk (associated with product delivery) is transferred. So they have 9 different order types.
  • As well as the buyer and seller, they need to be precise about the Broker and the Shipper.
  • The quality of the product is defined by a standard and is reflected in the product codes.
  • Pricing can be by formula.
  • Unit of measure is usually Kgs but when it comes to weighing coffee it seems to be important who weighs, when, and who pays for the weighing. I count 8 weighing types.
  • The journey coffee makes can be long and the value of the coffee at different stages changes so it seems the "place of tender" is important. A simple "delivery date" is not precise enough and must be qualified.

Phew! Complicated. But excuse me. Is any one of these points unique to coffee? Maybe the combination is unique. Maybe it is more sophisticated than Acme retail EDI. But what does it gain us to reject all that has gone before in 60 years of EDI and create new EDI ghettos ?

I hope they didn't. I hope they just defined some extra tags and specified some extra attribute values, and added them on to some existing, already utilised and proven XML order standard. Which brings me to my next point.

How (for the love of coffee!) can I implement this?

I went in search of the technical details. The PDF document listed 4 XML Appendices on the contents page. They seem to be missing from the web. I went to root URL and clicked around. I couldn't even find my way back to the document. I used Google to search the site - zilch. I used Google to search the web for "Green Coffee XML", no luck.

How can you expect a schema to be used if you wont tell anybody the details? If you want it to succeed make it freely available! Have you not heard of Peer Review?

Saturday, 28 June 2008

Is this what thinks is EDI? - Revisited

2 CDs with 7 million names addresses of children and parents, some with bank details, are put in the post and go missing. When I heard about this story I worried about the little guy. Well now the UK Independent Police Complaints Commission have investigated and released their report. It is well worth a read (pdf). All 61 pages and 282 paragraphs.

At the time the Minister was quick to publicly blame a "junior official" not following the "rules".

But what if there are no rules? Or too many rules? Or rules that constantly change? Or no one responsible? Or responsibility shared by too many senior people? Or if everyone is responsible it means no one is responsible?

It looks to me like the little guy was trying his best. See paragraph 130

He forwarded this email to Employee J in IMS and asked him to provide the 12 records as requested. Employee F included the following explanation in his email to Employee J:
…All we wanted was for NAO to realise exactly what they were
asking for, i.e. the scan data is live records of seven million Chb
customers when they only want to look at a dozen cases from
the scan. More importantly we needed to get the assurance of
how they would securely handle the discs containing the data
and how they would dispose of them once they had completed
the checking.
Obviously NAO should automatically realise this confidential
data has to be protected and no doubt they would do so.
However we needed something more than a verbal request to
ensure we had the paperwork to back up the request, things do
get mislaid and imagine the uproar if the discs containing the
ChB customer data went astray and turned up where they
shouldn’t – the long knives would be out. At least we would be
covering ourselves by getting the right assurance.

Wednesday, 2 April 2008

Why isn't EDI easier? Part I - One Standard to Rule Them All

EDI (Electronic Data Interchange) really should be less difficult than it is. If I am a start-up company and I want to purchase timber, metal, paper, widgets or some other commodity item, EDI is just too hard.

The overhead on establishing a relationship, agreeing a standard, and testing communications is huge. EDI is supposed to reduce cost but if it results in a supplier tie-in then it will produce unwelcome influences.

If it takes a department of expensive-to-employ / prickerly-to-manage IT geeks to support, then this "cost saving" thing called EDI, just got too expensive.

The first thing that springs to most peoples mind is Standards. How may ways do we need to represent a Purchase Order? Why are there many Standards? Wouldn't it be easier to just all agree on one standard? The answer to the last question is, not necessarily.

To explain this, consider the classic example of Purchase Order Delivery date. On a multi-line purchase order, a customer will probably want all the items delivering together. But while some customers will send a delivery date in the order header section of the message, some will send delivery dates in the order line sections. Some will send both. Some will have differing dates in the detail line section. Some customers will omit dates altogether indicating the goods are required ASAP.

Don't stop me, I'm on a roll...

Sometimes it will be appropriate to send an Earliest and a Latest date range. This could be in either section. Given a free text field some will write 9/11/01, some will write 11/9/2001 and some computers will generate 20010911. By the way, not everyone in the world agrees what year it is or how many months there are.

At the other extreme, timing might be important. For example when ordering services like an aircraft flight time, or an insurance period start and finish. This brings in the question of time zones and daylight saving adjustments.

  • So if a Standard is to be universal, it has to be large and complex.

  • But if it is large and complex, it won't be easy to implement.

  • And your next partner will use the Standard in a way that is at least slightly different to all your others.
When you read "X12 is popular in North America, EDIFACT in Europe, TRADACOM in Britain" remember, just agreeing one Standard doesn't solve anything. One Standard used in two different ways is like two Standards. So the incentive to converge isn't there.

I have been watching the ODF v OOXML Standards dust up with interest. The difference with EDI is that an EDI Standard has many thousands of individual implementations. If we accept Standards overlapping as much as ODF/OOXML damage each other. If interpretation of the specification will lead to differing implementations, which will lead to interchange problems between office applications. Then we shouldn't be surprised if EDI is in trouble.

I think this is the real source of the excitement over XML. It gives structure to data even without a standard. If we accept that even with a standard, there is a need for a "mapping" function, then maybe we should aim to make this as easy as possible.