QR code


  • Moscow, Russia
  • comments

@yegor256 · Shift-M/51: Michael Kay about XSLT

Michael Kay was our special guest. Michael is the editor of the W3C XSLT 2.0 and 3.0 language specifications for performing XML transformations and the developer of the Saxon XSLT and XQuery processing software.

Video is here.


[00:00:00] Yegor: Hi, everybody. This is Shift-M Podcast, the next episode, the new episode with a special guest, Michael Kay. He’s in my opinion, the godfather of XSL, XSLT, the language, the format which I love very much. You probably know about that if you follow my projects, if you follow me and my blog. So, I’m really glad to have Michael with me. Michael, could you please introduce yourself quite quickly?

[00:00:24] Michael: Okay, (chuckles), I’m Michael Kay. I’m the editor of the 2.0 and 3.0 XSLT specifications, and I’m the developer of the Saxon implementation of those specs.

[00:00:40] Yegor: You know, when I tell my readers, my listeners about XML, most of them or many of them, they say that I am quite old and I love that format because I…

[00:00:50] Michael: Not as old as me.

[00:00:51] Yegor: I’m not a modern programmer, I don’t understand JSON, YAML, and all these formats, and they keep telling me that XML is dead, so that JSON is the future. What’s your take on that?

[00:01:01] Michael: They’re not engaged in the fields in which XML is important. XML is very, very widely used in all kinds of document publishing. It’s used for things like patents, for academic publishing, scientific publishing, legal publishing. is published in XML. That’s there because it’s got to last for 50 years. Well, legislation lasts for 700 years, and they chose a format that’s capable of surviving for that long. They are not going to change to something else quickly, so XML is definitely here to stay in those sorts of fields. What’s going to change is that people need to do something quick and cheap and dirty are going to use the quickest, most effective solution available to them, and very often that’s JSON, particularly if you’re working in a web browser. So it’s a distinction between whether you are doing something for the longterm or whether you’re doing something cheap and cheerful that’s only of local significance.

[00:02:16] Yegor: You know, they claim that XML, like you just mentioned, it’s a good format for documents, for something which is large, and they have to be standardized somehow, but when it’s a choice of what language or what format to use inside an application, let’s say we’re developing a mobile app or a web app, then the choice for XML… it looks to them like quite strange. They say, why not JSON, because we don’t need to be standardized. We don’t need to publish our documents anywhere, it’s inside, so inside of the application… They tend to use JSON more and more.

[00:02:48] Michael: Well, conventional programming languages have data models that aren’t very well suited to documents. Using the DOM is hard, it’s a complex structure, because the language wasn’t designed for that. There’s not a good mapping between the data structures used in the document world and the data structures used in conventional programming languages, which is where a language like XSLT comes in. It’s why there’s a niche for a special purpose language that’s designed for that kind of data, rather than the kind of data that you get in typical data processing applications.

[00:03:32] Yegor: And what is your story? How you do, how did you end up doing XML stuff, XSL stuff? Is it, was it like business, moneymaking, like it was necessary for you, or you really liked that format?

[00:03:44] Michael: It’s a long story. My specialism, my PhD many years ago (chuckles), was in database technology, and I spent 25 years working in the British computer manufacturer, mainframe manufacturing, then it was, ICL, on database technology. And we’d got to the point in the 90s where we were developing some big publishing systems for clients, particularly scientific publishing, news publishing. The 90s was when the web started to become sort of industrial scale in that kind of way. And at the time I was working with people in Fujitsu, developing object database technology, and we were trying to deliver those sites using the object databases, and finding that it didn’t actually work that well. It worked, we built some successful systems, but they were very expensive to build and we realized we haven’t got enough sort of reuse and enough power in the tools that we were using to build lots of sites quickly. And XML was coming along, so we knew XML might be the answer to that problem and then what really triggered it was we got a request to tender, an invitation to tender from Oxford University Press to do with the dictionaries and that was very much SGML and XML-oriented. And I said, “Hey, this is our opportunity to get into XML. “We’ve known for ages we need to look at it, “this is the chance.” And the marketing people said, “No, that’s too a high-risk. “We’re not bidding for that contract “with a technology that we have no experience in.” So I said, “Well, how the hell “are we going to get experienced in it (laughs), “if we ignore an opportunity like this?” And so they said, “Okay, Mike, you’ve got three months. “You write a response to that bid, see what you can do.” And that’s really when I started playing with the technology, I started developing what became Saxon, as a prototype, to show to that customer, to show that we could handle XML and solve their problems and it was good fun because I hadn’t done any programming for years, I’d been too senior to do any programming in that kind of company, you’re expected to spend your time attending meetings, not actually writing code. So I wrote that prototype and presented it to the customer and the customer liked it, and our marketing people said, “Hey, this is high risk, we’ll double your estimates.” And so they doubled the price, and so the customer didn’t like the price and didn’t buy it. Which I was very aggrieved about (chuckles), but I’d had three months fun anyway. And that prototype eventually became the Saxon product. What I realized was that I’d been developing a Java library to do basically rule-based transformation of XML hierarchies. And then I saw that XSLT was coming along and that was doing rule-based transformation of XML hierarchies and a lot of the concepts seemed to align very closely, so I thought, let’s turn my library into an XSLT processor and that’s how it happened. And then I got invited to write a book on XSLT, and then as a result of the book I got headhunted by Software AG and then it all moved on from there. But the key thing is, I didn’t start it, James Clark started it, XSLT, he invented the language. I picked it up and did versions two and three and whatever, and that’s been the story of my career, really, picking up something, a good idea, developed by a good ideas person and industrializing it, turning it into a good product.

[00:07:42] Yegor: But when you were making the Saxon product, which I use in all of my projects right now, there were other products on the market, as far as I remember. I remember that like 15 or maybe more years ago, there was the Apache product, Xerces or what’s the right term?

[00:07:59] Michael: Yeah, I mean, there were a dozen XSLT 1.0 processors that came out very quickly, and it was a nice, small language, it was wasn’t too expensive to implement, so. And of course, you know, there was a hype curve, XML had an enormous hype curve at the beginning, ‘cause you know, all the big players, Oracle and IBM, and Sun, and Microsoft, all for some reason decided rather than fighting each other, they’d collaborate on it. And so everyone was very excited by having a standard in that area, which we’d never had before. And then XSLT came along about a year behind as the way of processing it and lots of people had a go at implementing it. You know, there was a Microsoft implementation, the Lotus XSL implementation, and from Lotus, which became part of IBM, and which then moved into what is now Xalan and there were lots of what I call hobbyist implementations, Sablotron, no one knows about Sablotron anymore, there was one produced by (indistinct), in Python called full suite, it was called, I think at some stage. So there were lots of them and how did Saxon sort of emerge? I think basically because I stuck at it, most of them, most people produced a good version one product, and then never got beyond it. And why did they never get beyond it? Well, I think in the case of the commercial operations, IBM, Microsoft, what have you, their problem was producing a business case, the software was free, they weren’t making any money on it, you can produce a version one by promising your managers that it’s going to take over the world, but when it hasn’t taken over the world, then it’s quite hard to get the funding to produce version two. And then the other end of the scale, they were the hobbyist people who were writing in their weekends. And like, I guess their wives told them they wanted them to do something else with their weekend, or perhaps they got interested in some other new technology, because they tend to be the sort of people who like to do something new and doing a version two or something old is not what you want to spend your weekends doing. So really the reason that Saxon carried on while those other things failed was stickability. I stuck to it and I found a business model that enabled me to fund the ongoing development and make some money to continue that development, which the other people hadn’t. So the product was, it was good, but it wasn’t in any way, you know, so much better than all the others, that it was going to beat them on technical grounds. It was much more that I had a successful business model that enabled me to keep developing.

[00:11:22] Yegor: I remember that like 10 years ago, when I was choosing the XSLT engine for my Java projects, Saxon was not an option for me because it was all commercial, if I’m not mistaken. So it’s kind of.

[00:11:34] Micahel: I mean, that has always been an open source and a commercial version.

[00:11:40] Yegor: But there were some limitations, I remember that it was always kind of difficult.

[00:11:43] Michael: Yes, but they, in some ways the limitations, well that’s fair enough. I mean, the BBC’s coverage of the 2012 London Olympics was all using the free version of Saxon. I resented that slightly since they spent billions on the Olympics and I didn’t get any of it. (laughing loudly) But that’s the, you know, there’s a downside to the business model as well, which is you, I mean, from my point of view, the reasons for doing open source is that you get millions of users and then if, you know, if 1% or 5% of them decide that they need more than the open source version, then you get enough revenue to fund the whole thing and so open source is the marketing mechanism, it’s the loss leader that brings in the revenue. So I don’t mind the fact that there are billions of users of the open source version, because the model works well for me.

[00:12:46] Yegor: But why people now would pay for Saxon? I know only one feature, which I miss in the open source version is the external functions aside from there, they have everything so I don’t understand how your model works, what people pay you for.

[00:12:49] Micahel: There are a lot of people who pay for it, not because they need the extra features, but because they like the sense of security of using, of having a commercial relationship with a supplier. It’s not expensive for them after all, it’s a pretty small part of their total IT budget, but they feel more comfortable with a commercial product than with an open source one for something that they’re particularly dependent on. So that’s the answer for some people, for other people, yes, they need one or more of the features. They might need streaming, they might get benefit from schema awareness, they might benefit from the optimization capabilities, which becomes significant when you’re doing queries on large documents, so, yes, the open source version is good enough for 90% of users but once people get stuck in, they find they need the other, need something else from the mix of things that we offer in the, you know, the rest of it.

[00:14:09] Yegor: And how many programmers do you have right now in house if you can disclose this information?

[00:14:13] Michael: In the team?

[00:14:14] Yegor: Yeah.

[00:14:15] Michael: We’re a team of six people of whom four are developers.

[00:14:20] Yegor: And you are the developer, or you’re in the management side now?

[00:14:23] Michael: No, I’m cutting code everyday.

[00:14:27] Yegor: That’s amazing. Well, I know you from Stack Overflow, you’re not only writing code, you’re also answering questions there and quite helpful. You actually answered a few of my questions, no, I think many of them, so that’s my next question to you. So, how do you find time for that and how do you feel about the Stack Overflow platform? ‘Cause most people don’t do that and they claim that this platform is, has all the answers possible to be given, so that’s it, so no reason to be there because all the questions have been answered already, so people don’t spend time there, but you do, so.

[00:14:59] Michael: Yes and, I do it because I, partly because I think it’s a good idea in principle and partly because I enjoy it. Actually, everything I do is a combination of doing things because I think it’s a good thing to do and doing it because I enjoy it. If things don’t meet one of those two criteria, then they don’t get done (chuckles). Except for really necessary things like doing my tax returns. (laughing loudly) But on the whole, I do things because it seems a good idea and I enjoy it. Why do I enjoy it? Because I think it’s very important if you’re developing software to be in touch with your users. I actually quite enjoyed the first few years of doing Saxonica and I was spending half my time doing consultancy and so I actually got out to visit customers. In those days consultancy meant you actually traveled, you actually flew to California and visited your users and had dinner with them and drinks with them and things like that and that was fun, I miss that now. Just knowing them by email, isn’t the same. But even if it’s just electronic, knowing what your users are doing, knowing what they have difficulty with, knowing what they find easy, knowing what they find hard and picking up ideas from other people, answering the questions. That’s important. It’s a contact with the user base. And apart from bug reports, you know, it’s the only more or less, the only contact I get. And that sort of tells you, I mean, what makes a good product? Users understand the error messages, people will tell you, one thing I like about Saxon is the error messages. Well, that’s a really boring mundane thing, but to me, a bad error message is something that really needs to be fixed. That’s the, that’s what users are dealing with every day, they’re reading my error messages. If (chuckles), if those glare out as being unhelpful, as being badly spelled, then that’s their experience with the product, so it’s important to get it right. And I put a lot of effort into those sorts of little details and to do that, you’ve got to have the contact with the user base. Just see, I mean, getting good error messages is really quite an art because do you phrase the error message in terms of the proper terminology from the spec? Or do you use the terminology that the users are using out there, which might be quite wrong. What users call a tag isn’t what the specs call a tag. They’ll use a tag to mean element. So which word am I going to use in an error message? It’s quite hard to get that sort of thing right and getting a balance between a message that is technically correct, and a message that users understand, sometimes requires a fair bit of thought. And then you’ve got to phrase the error message in terms of what the user was trying to do, not in terms of what was going on internally. And that again gives you a significant challenge, so yeah, you have to think about those things, to think about those sorts of things. You firstly, have to use the product yourself, that’s very important. And my project over the last year has been translating the Java code of Saxon into a C-Sharp version of Saxon. And so to do that, I had to write a translator for Java to C-Sharp and so how do you write such a translation? Well, obviously using XSLT (chuckles). I mean, what you’re doing, Java is has a syntactic structure, you pass it and you get a syntax tree, so you’ve got a tree structured information structure, and you’re converting that into another tree structured information structure from which you generate C-Sharp and how do you transform one tree to another? Obviously you use XSLT. It’s the natural choice that anyone would come up with, isn’t it (chuckles)? I’m joking (indistinct).

[00:19:19] Yegor: No, it’s not. You’ll be surprised, but it’s not (laughs). You know, I’m working right now with three projects, three different teams and they write translators from one programming language to another programming language, none of these teams ever considered XSLT as a translator, but I’m not the member of these teams, I only supervise them so I cannot enforce them to make those decisions, but they don’t make this decision, why? Because they don’t probably know about XSLT. So what they do, they build this abstract syntax three in memory, like part like they make it objects, not XML syntax tree, which is natural just as you’ve said, but they make the objects like Java objects or C plus plus objects and then from these objects, they build another source code making these basically print like statement.

[00:20:03] Michael: And of course, I mean, I’m joking because if I wasn’t involved with the XSLT, then it wouldn’t declare to me to do it that way, or anything.

[00:20:10] Yegor: Okay.

[00:20:11] Michael: But, the fact is when you’re familiar with technology, then you can see that it’s ideally suited to that job.

[00:20:19] Yegor: Yeah, it’s perfect, yeah.

[00:20:21] Michael: And it’s perfect for it and so that’s the way you do it. And of course it has, the benefit is not that it’s XML or that it’s XSLT, the benefit is primarily the paradigm that you’re doing a recursive descent rule-based transformation. And that’s what XSLT is, it’s a rule-based language.

[00:20:42] Yegor: You know, for most people, it’s hard to understand this language, that’s what they complain about. I wrote the compiler in XSLT just last year, the compiler like instead of, you know, making, it’s, I have a language which is the programming language, and then I had a Ja–, I have a task to implement the compiler from this language to Java so I made it entirely in XSLT. So I have many, many style sheets and they go in one after another, so it’s like a chain of style sheets and then I transfer the inputs, abstract syntax tree to the final result.

[00:21:10] Michael: But we’ll. If you read the textbook on writing compilers, it talks about it as a pipeline of tree to tree transformations and that’s exactly the typical architecture of an XSLT application.

[00:21:25] Yegor: But when I show you this, when I show this code to other programmers, most of them just say, I don’t understand how it works, because it’s something I haven’t seen before, so they just.

[00:21:33] Michael: Yeah, it is and that’s the resistance to XSLT is because for most people where they’re coming from, particularly programmers, where they’re coming from, the direction of non-programmers is quite different, but programmers where they’re coming from, it’s just so different from anything they’ve ever seen before that it requires some rewiring of the brain and therefore the, you know, the enthusiasts, so those who get over the initial learning curve and discover why this weirdness is actually such a good thing, but for people first coming to it, it can be quite an obstacle.

[00:22:24] Yegor: Let me, quote you, I was listening to one of your speeches of the, recent speeches of the conferences, and you said exactly this, “Most XSLT programmers don’t know, computer science, “they see examples and they understand how they work.” So (laughs), that sounds really accurate because most people don’t really understand the XSLT, how really they don’t understand that it’s a functional language, functional programming language. They just.

[00:22:48] Michael: Yeah. Now, I mean, it’s, I find it fascinating because I find, I sometimes find new technologies quite hard to get familiar with and to adopt and that’s because I think I, when I look at a new technology, I want to have a deep conceptual understanding of it before I use it. I know other people who are much better at picking up something new who have a different learning style, they learn by example, they see something that works and they bend it and adapt it and make it fit without ever having a deep understanding. And it’s like, you know, some people can jump in a car and press the pedals and it goes in the right direction and other people really want to understand why you have to turn the steering wheel back after turning it, you know, it’s, you can over intellectualize things and I’m on that end of the spectrum, probably as a spectrum as well.

[00:23:59] Yegor: Don’t you think that we are getting more and more people of the first kind in the programming industry?

[00:24:06] Michael: Oh, absolutely. Yes. Yes. It’s the difference between engineers and mechanics. We’ve got a lot more mechanics now who are very, you know, capable of doing a good job, but they’re not computer scientists and, you know, as computer scientists, we’ve been building technology to enable those people to build systems. And so we shouldn’t complain.

[00:24:39] Yegor: Right (laughs). Do you think it will be ever possible to create something like a simplified version of XSLT? Which would look more, you know, I’m dreaming about this project for a few years already, something which will simplify the syntax of XSLT, because right now it’s basically XML and then XSLT is not a dialect, but it’s the language which we use as XML, basically and XSLT is just elements which we use there, right? But maybe we can turn it into something more use, more traditional for programmers, like Java, like the language where you have statements, statements after statements.

[00:25:21] Michal: A lot of people have tried, you know, to prove a different syntax for XSLT and it’s not that difficult to do and I think what’s happened is that when people have done it, you realize that actually you thought syntax was the problem and it wasn’t, the problem is not the syntax, it’s the concept. It’s what the hell is a template rule? What the hell does applying templates actually mean? And it’s not a, you think syntax is the difficulty, but it’s not. The difficulty is actually the semantics of the language and improving the syntax doesn’t help. The other thing is that once you’ve been in, once you’ve got past that stage of the syntax looking weird, you actually realize that there are some benefits for having an XML based syntax. The benefits of any big XSLT based application that I’ve seen ends up exploiting the fact that XSLT is XML and that you can generate XSLT and you can modify XSLT, you can build libraries with XSLT components and assemble them in different ways, and the fact that you’re using the same conceptual tool set to manipulate your data and your source code. I mean, that’s something that comes from Lisp, isn’t it? But they in turn, programs are essentially the same thing and XML and XSLT is if you like a continuation of that Lisp concept of not separating data from programs.

[00:27:14] Yegor: So it’s not the syntax, which is the problem? Let me ask you something else about the committee, what would you sit on like the W3C committee, which defines (indistinct).

[00:27:24] Michael: Which I did sit on, it’s now wound up, but.

[00:27:28] Yegor: So how do these guys work? How do these communities work? Can you explain? Like you sit there, how they define the standards, how, for example, I can get in? Is it possible? Is it an open door or?

[00:27:39] Michael: Well, as I say that most of those committees have now disbanded through not having enough people to take things forward. I mean, I am hoping to reconvene a group to develop an XSLT in fall, I’ve been promising that for a while, but it’s definitely going to happen this year (laughs), that’s my project for this year.

[00:28:01] Yegor: I would love to join by the way. (laughing loudly)

[00:28:06] Michael: So how does it work? The actual dynamics vary quite a lot from one group to another. On the XSL working group, I sort of inherited the role of editor from James Clark and the way the group worked didn’t really change after that. The process was very much that I was in a kind of chief designer role, not just editing the spec, which doesn’t mean that all the ideas were mine and that I had the final say on everything, it means that people brought their ideas to me and I had to turn them into something that worked or to come back and say, “No, I don’t think that will work.” Perhaps something else would work. It also means if I brought an idea to the group, then it would get subject to a lot of scrutiny, people would ask a lot of challenging questions. It would be improved in the course of review. But I was still sort of acting as chief designer with a group around me that was helping me to get the details right, if you like. So it was a very constructive and friendly group. The XQuery group was very different. When I first came to the XQuery group, I was sort of shocked and horrified because there were six people on the group, at least, who were capable of being chief designer. Who were capable of designing a programming language. And they all had radically different ideas about what (chuckles) the language should be. So it was a group that had immense tensions on it, simply because it had so much talent in the group. There were too many creative individuals with different ideas as to where to take the language, which is very much harder to work with in many ways and you didn’t have the same feeling of a cohesive sense of unified purpose. What you did have was a couple of people on that group, Paul Cotton, who was the chairman for most of the time, I was involved, Don Chamberlin, who was the XQuery editor for lots of the time, Mary Fernandez, who coordinated with the Excel group on defining X path, who extremely good moderators, who took the creative people by the scruff of the neck and banged their heads together and said, you know, this is what you agree about, this is what you disagree about, let’s concentrate on, you know, sorting out the areas you agree on, we’ll work out the areas you disagree on next week, and who forced the process through by managing the people who would otherwise have killed each other (chuckles). So, yeah, different groups have very different dynamics. I’m told the XML schema group, before I joined it, they were meeting for 40 people there. And they all brought ideas to the table and just managing the agenda was one of the biggest challenges because they’d come to a week long meeting, somewhere in a hotel, or somewhere in Florida, and they’d have more work on the table than they could get through in a week. So, yeah, it’s different for each group.

[00:32:05] Yegor: And how these groups, they get created? Is it somebody who’s deciding that or it’s like chaotic process?

[00:32:17] Michael: I don’t really know because I’ve never been involved, it’s always been going by the time I got on board.

[00:32:33] Yegor: And how did you get on board?

[00:32:34] Michael: As I say, as with software, you know, I’ve heard someone say, there are two kinds of people, there are starters and finishes, and I’ve always been, I’ve always been the kind of person who finished things that other people have started (laughs).

[00:32:35] Yegor: So how did you get on board? Somebody invited you, or you applied?

[00:32:38] Michael: I got invited by Sharon Adler, who is the chair of the XSL working group, basically on the strength of the Wiley book. So the group had developed XSLT 1.0 with Sharon Adler in the chair, and James Clark as editor, I developed an implementation and I was asked by Wiley to write a book on XSLT, which I did, when the book landed on Sharon’s desk, she picked up the phone and asked me to join the working group and then I had to persuade ICL to give me the time to (chuckles), to let me do that.

[00:33:16] Yegor: So basically it’s not like there’s a form which we can fill up and apply for?

[00:33:19] Michael: Sure you can. Sure, yes, people do. They, I think most of the people who give it enough time to be worth having on the group are people who are not just doing it as a hobby. They’re definitely engaged, they need the group to be successful for professional and business reasons. Otherwise they find they haven’t got time to read all the papers and email, and then they get lost in meetings and they drift away.

[00:33:59] Yegor: Was it profitable for you to be in this group?

[00:34:04] Michael: In what sense?

[00:34:06] Yegor: What did you get out of it?

[00:34:12] Michael: It’s a very, it’s a creative process, it’s a rigorous process, it’s a very frustrating process. It’s all of those at the same time. If, as a software developer, I come up with an idea for a language feature, I can implement it that morning and think that it’s done. If I take that same feature to a standards group, it will be challenged. Why do we need it? Can’t it be done this other way? Why did you choose that keyword rather than a different keyword? Can’t you solve this other problem at the same time? You will come up with a vast number of challenges to your little idea that make it bigger or smaller or change it in all sorts of ways and that can be very frustrating ‘cause it takes a long time, but it also produces a much better result in the end than one person’s good idea from, you know, sitting in the bath. So yeah, that’s what you get out of it. You get the, a very thorough review of ideas in which people bring things to the table and you end up with synthesis. It doesn’t always produce a better result, you know, sometimes you do get the problem of committee compromises and you can definitely, there are definitely bad decisions that committees have made in XSLT and XQuery and XML and everything else where you just wish we hadn’t made that compromise but that’s the way of the world. One of the most difficult things is keeping the design coherent. Setting yourself design principles and sticking to them with every new feature. An example of that with XSLT is error handling. XSLT 1.0 had a sort of principle of no runtime errors and that’s because it was a lot of the driver for XSLT 1.0 was the idea of running it in the browser and in the browser, the last thing you want to do is on the user screen, put up something that says error on line 17 of style sheet, everything should produce an answer if you, even if it’s the wrong answer. So I think that was part of the 1.0 design thinking, but then people realized that that makes debugging very difficult. If an incorrect program just produces blank output, then it’s very hard for the programmer to work out what they did wrong. And so in 2.0, you started to get more of the concept of static errors and dynamic errors and a little more systematic approach to error handling, but then you find, although you’ve changed what you’re trying to achieve, you’ve then got the fact that you can’t change the existing language, you’re stuck with the way it was designed first time round and so you end up with new features, having one philosophy and the old features, having a different philosophy and you start to lose the coherence and that’s hard to achieve, how hard to get right.

[00:37:33] Yegor: So there is no one single architect in the group, it’s always the democratic decision-making process?

[00:37:40] Micahel: Yeah, it’s not democratic in the, and certainly the way W3C works, it’s not democratic in the sense of taking a vote. The Tim Berners-Lee philosophy is very much the benevolent dictator. The chair has to declare that consensus has been achieved (chuckles), whatever that means. So it’s not numerical counting of votes, it’s people going away from the meeting, being prepared to accept the decision of the group.

[00:38:16] Yegor: But the decision has to be made by the group, not by one leader.

[00:38:19] Michael: The decision has to be made by the group, yeah, and that can be tough. And yes, you know, you will get compromises, I’ll let you have this feature, if you let me out of that one, (laughs), or more often, the compromise is one person will be very, very enthusiastic about some new feature, everyone else thinks it’s of marginal value, but it’s much easier to get that, keep that person quiet by accepting their idea and putting it in the language than to have more and more arguments as to why it shouldn’t be added. So it’s the easiest route out for a committee is sometimes to accept something it doesn’t really want rather than keep fighting against it. (laughing softly)

[00:39:11] Yegor: That’s weird. And you, at the same time being in the committee and the chief of Saxonica, your private company, so I feel that there is a, sometimes could be a conflict of interest, when you have this feature, you want this feature for your customers, you probably already implemented this feature and you gave it to your customers and then you bring this feature to the committee and say, I would love to have this in the standard. They may say the group may say, you know, we understand why you’re doing that because you are the, does it happen?

[00:39:39] Micahel: I found. I found it usually, it doesn’t usually work like that. I mean, one of the first things I did that wasn’t in the standard was grouping and multiple output files and they sort of go together and I did grouping because it was clearly needed, in all the applications I’d had to write, I needed grouping and so I invented the way to do it and then when I joined the group, I took my grouping design there to the working group and everyone there accepted that we needed to do grouping and that the feature was needed, but there was then lots of constructive criticism about the way I’d implemented it in Saxon and questions about the, how edge cases works that I hadn’t even thought about, and you know, more use cases, how will it handle this problem? How will it handle that problem? The group improved the design and I implemented the improved design. So that sort of syn–, I regard that as synergy between doing an implementation, having users and developing a standard, and most of the time that it was synergistic, the fact that we had users, the fact that I had an implementation, the fact that we were developing the language actually worked in harmony.

[00:41:12] Yegor: And the source code of Saxon is open or not?

[00:41:16] Michael: There’s an open source product, there are other features which are proprietary. The schema processing is all proprietary, but the, you know, the XSLT code is largely open source. The streaming is proprietary.

[00:41:32] Yegor: It’s on SourceForge.

[00:41:34] Michael: Yeah, the we’re sort of moving away from SourceForge. It has historically always been on SourceForge and I, but I think we now just use SourceForge for publishing new versions, because, to make sure people don’t download old versions. The main page you get to do is from the repository on our own Saxonica sites.

[00:41:53] Yegor: And where do you move to GitHub?

[00:42:00] Michael: We’ve got some things on GitHub, but a lot of it is on repositories on our own site.

[00:42:05] Yegor: Okay.

[00:42:06] Michael: You can download it from there. We’re increasingly moving to, you know, on.net, it’s all you download it from NuGet on Node.js, you download it from NPM, it’s part of the sort of, and Java people download from Maven.

[00:42:22] Yegor: Right. That’s what I do.

[00:42:24] Michael: Yeah.

[00:42:26] Yegor: And do you know anything about the possibility of compiling XSLT to some binary code? Because in my case, the performance is quite an issue. So XSLT is a great standard, it’s a great idea. I write all the style sheets, like I told you, in my compiler, I have many, many of these style sheets, but they have like maybe 30 of them and to run all of them one by one, it takes seconds, it doesn’t take microseconds. So I’m thinking maybe it would be possible to turn those XSLT into some binary code.

[00:42:59] Michael: Well, I mean, the answer is quite a few years ago now. We did bytecode generation in the commercial product and the enterprise edition and when we first did it, it gave us a performance improvement, sort of between 25 and 50%. And these days it tends to be less than that. It’s very often, only 10 or 15%, which means it’s hardly worth doing. The reason the performance advantage has declined over time is that the Java hotspot compiler has got better and we haven’t got better (chuckles). Because those guys who write the hotspot compiler really understand what goes fast in machine code terms. To do code generation and make it go fast, you’ve, it’s a mindset and a knowledge about the behavior of the hardware that most mere mortals don’t have. You, don’t get the amount of benefit that you’d expect. And when we look at it in detail, the benefit that we are getting is not because we’re generating code, it’s because we’re making decisions at the right time. And you can actually reproduce that effect of making decisions at the right time by taking things out of the loop, for example, out of a runtime loop, into a static decision at co-generation time, you can do that without generating bytecode. So it turns out, I think that cogeneration is not the answer to improved performance. The other thing we found is that a lot of XSLT workloads, people are doing the static analysis on the style sheet or the compilation once for every time they execute. And if you do that, then it becomes compile time that’s important, and not runtime. It’s, there are an awful lot of workloads where people are spending three seconds compiling the style sheet and then three milliseconds executing it. And if that’s the ratio in your workload, then the last thing you want is to move more work out of run time, to do more work at compile time in order to reduce the runtime because the runtime is negligible. Also for all the simple transformations, the transformation is a lot faster than the passing. So the XML passer is taking longer than the actual transformation. And so if you get a work flow like that, you know, taking a long time, there are all sorts of reasons for it and one is that the most common reason is that the star sheet’s being compiled every time it executes and the compilation is taken too long. We’ve only sort of grasped that fairly recently, that we really need to put more effort into compile time performance. Another reason is simply, you’re dealing with a high level of declarative language and with high-level declarative languages, it’s like the SQL, one line of code can take six hours to execute. You’re not writing at a low level where the statements you write in your program have a one-to-one correspondence with hardware instructions. There’s a very, very indirect relationship and therefore you need to think about the performance of your code in a different kind of way. And I mean, this is one of the dilemmas to make your code perform. The idea of a declarative language is that you don’t know what’s going on inside, it’s up to the optimizer to work out what’s going inside but the reality is that to write efficient code in a language like SQL or XSLT, you have to have some kind of appreciation of what you are asking the machine to do and whether it’s going to be a quadratic algorithm or an in log-in algorithm or whatever, how it’s going to scale with your data size, you know, your several layers removed from the machine and yet to achieve performance, you’ve got to understand what’s going on in those layers, which is challenging.

[00:47:39] Yegor: How are you, how tight are you integrated with the people who develop web browsers like Chrome, for example, or Firefox? Do you work with them?

[00:47:51] Michael: Not really.

[00:47:52] Yegor: Not really?

[00:47:53] Michael: No. The, I had a bit of rant in my blog about 2005 or 2006 (laughs), at the stage where the browser developers were deciding that they didn’t like XML, they didn’t like XSLT.

[00:48:10] Yegor: Right.

[00:48:13] Michael: The rant was mainly about who are they to decide? You know, why should? It’s the power struggle, you know. I like the idea of an open layered architecture in which people own one layer and leave the layer above just to other people, whereas the web browsers remind me of the sort of very vertically integrated days when if you produce a computer, you controlled what applications were allowed to run on it. We’re seeing that again with mobile phones, aren’t we? I think the web browser should be a sort of neutral platform on which other people can develop technologies above it and where we haven’t really seen that, we’ve seen a lot of the very proprietary sort of space and yeah, they decided they weren’t interested in XML. We decided that we were going to do an XSLT processor in the browser anyway, which works pretty well but of course it’s a minority interest. The one thing that has pleased me recently, actually, is we’re seeing, most adoption of Saxon in the browser has been from people who are XML and XSLT enthusiasts who are very much committed members of the community. But we’ve been seeing a few users recently picking it up who are new to that and that’s nice to see.

[00:50:01] Yegor: ‘Cause that’s really sad. ‘Cause I think it’s, that would be a great, it is a great technology, having XML, having web servers deliver XML only, and then XSLT runs on the client on the Chrome and then does the (indistinct).

[00:50:17] Michael: It is absolutely the way things should be if you do all the rendering, all the user and traction on the browser and the message sent from the server to the browser should be as abstract and as pure data as possible. And that means XML, no doubt, I mean, no doubt that’s the way things should be. But if it hasn’t, it hasn’t found favor, except that of course HTML has tried to develop in that direction of being a somewhat more abstract formulation, but they’ve tried to get rid of the very presentation oriented aspects of HTML. So they’ve moved HTML in the direction of XML, if you like, while rejecting XML itself.

[00:51:07] Yegor: Yeah. I want to ask you one question, which I like from the slack channel of yours, of your group, of the XML group. The question is, are there any features in XSLT or XML or XQuery, which you would, if you will have the power, you would replace, remove, change? Do they exist? Let’s start from there (indistinct).

[00:51:31] Michael: Yeah and I guess there are two kinds as well. There are some that are very little things, like the choice of keyword for XSL value of should have been XSL text and that leads to a lot of users making the same mistake, you know, hitting the same problem just because the choice of keyword is wrong. And similarly, the handling of default namespaces. Every user falls into that same trap that their path expressions don’t select anything because they didn’t think namespaces mattered. So, you know, we got the default wrong and that’s very hard to change. And then at the other end of the spectrum, I have doubts about the, some of the really big things that we did. So for example, the biggest thing we did in XSLT 2.0 was schema awareness. And I think in retrospect, it would be hard to say that schema awareness was a success. The idea was right, you can get considerable benefits from schema awareness, if you’re writing a big style sheets, schema awareness can definitely make it more robust, easier to debug, it can give you a lot of software engineering benefits, but at the same time, it hasn’t been successful in terms of adoption. Most people aren’t using it and the reason they aren’t using it is because the short-term cost of adoption is high compared with immediate benefits. You’ve got a lifecycle benefit over using it, but you don’t, if you’re sitting down on Monday morning and want to have some code running by lunchtime, then you leave out schema awareness because it seems too difficult. So it’s, there’s a sense in which it was strategically the right thing to do, but somehow in the way we did it, we just made it and I’ve constantly been trying to tweak it to try and make it more of a, you know, a magic switch on schema awareness and get all the benefits but it’s very hard to achieve that. It’s something that people have to put a lot of investment into before they can get the benefits out. So I have doubts about that sort of thing. Streaming similarly. Streaming is really valuable for the 3% of users who need it, but it has no value at all for the other 97%. And that makes you wonder whether the complexity of doing it in the spec was actually justified. So those are tough calls. You do want to increase the power of the language, but at the same time, I’ve always got much more satisfaction from doing little things that everyone benefited from, you know, like the double bar and caffeination operator. Everyone says, “What a great idea! “Why couldn’t we do that before?” Rather than the big strategic things, which cost far more.

[00:55:03] Yegor: What are you working on right now, XSLT 4.0?

[00:55:07] Michael: At the moment we’re in the process of hoping to ship Saxon 11, we’ve shipped Saxon 11 on C-Sharp, we’ve got to do a maintenance release of that and 11.1 for C-Sharp and the first release on Java and moving forward the Saxon C product to that same 11 code base. So we’re working on that and hopefully that will be out within a couple of weeks and that means at the moment, we’re in that process of running millions of tests and working out why three of them are failing (chuckles), which is (laughing), which is very dispiriting and frustrating. It, there are so many tests now that it’s a nightmare. Now and Toby Walsh who joined the team last year has been doing a lot of work on automating our build and test process and that’s been very, very valuable and hopefully will lead to more reliable releases, and more frequent releases. But after that, when we start getting a clean sheet of paper, we’ve got to make some decisions. I want to put some more effort into the Saxon-JS product and that’s been, you know, a bit too quiet for a year, one of the problems is we’re not making any money on it, I want to see if I can find some way of reproducing the business model, that’s generate some revenue for Saxon-JS, which will create a better justification for doing development, work on it. A key technical challenge with Saxon-JS is doing asynchrony because the JavaScript platform needs to be asynchronous. When you fetch resources, you’ve got fetch them asynchronously, and that’s very hard to map into the XSLT way of doing things so that’s a technical challenge. Other things, yeah, carrying forward the 4.0 initiative, trying to get a group of people together to define a 4.0 and that I hope will be lots of handy little things, rather than, you know, one big strategic thing that takes five years or 10 years, I think three is what we’re talking (laughs).

[00:57:23] Yegor: But how it’s going to happen if you said that the group has been disbanded?

[00:57:27] Michael: I have to put together a new community group basically. They won’t be under W3C except as a sort of hands-off relationship. A number of the groups have continued as community groups with a sort of informal way of working, but it’s different now because of course, W3C had the model that you only work on a standard, and if there aren’t going to be at least three implementations. And XSLT, you know, there’s no chance now of having three implementations, the only people who’ve implemented XSLT 3.0, well, there have been three, there were three implementations. There was mine, there was Altova’s and then there was (indistinct) but (indistinct) has, seems to have found other things to do with his time and Altova, an interesting company, because they always implement the standards, but they never participate in developing them. I don’t understand the logic of that, but that’s up to them, that’s the way they work, they’re entitled to do that. So it’s, it won’t be a standard in the same way in that you expect to see lots of implementations, there’ll be more specification for the next Saxon release and the world’s changed in that way.

[00:58:52] Yegor: So there was a possibility that we may not see this as a standard, we just may see your next version of Saxon that’s?

[00:58:59] Michael: It’s, oh, it might, yes, might, some people might perceive it as proprietary extensions, and will almost certainly provide a way of switching off all the extensions so that you still do have official standards conformance. But yeah, but the world changes when you haven’t got lots of implementations that have to be compatible with each other and when there’s little chance of getting them. You know, a lot of programming languages, you know, you take PHP and Python, you don’t have lots of alternative implementations of PHP or Python.

[00:59:35] Yegor: Right (chuckles).

[00:59:38] Michael: It’s not something you expect.

[00:59:41] Yegor: That’s right. Now, which programming language do you personally like most? Java, C-Sharp? What’s your favorite?

[00:59:48] Michael: Ooh! Well, there are lots I haven’t used and I’m sure out there somewhere is the perfect language. There are lots of that I would like to have used more, I’ve done a very little bit of work with Scala, for example and I would like to have done more with Scala. I would also like, I mean, I’ve over the years, I’ve become more and more a fan of functional programming and so writing in a pure functional language would have an appeal to me now. More and more of my Java and JavaScript is using a functional paradigm, that’s the way I now choose to write code so a language that enforces that would be a good thing. What I miss in the programming languages I use is parallelism and asynchrony. I haven’t found a language where doing multi-threading and parallel processing becomes really reliable and robust and bug free. So, you know, I would quite fancy playing with Erlang to see if it solves some of those problems people say it does, I don’t know how that would work for me. Java has been a pretty good development for the industry. It’s and I mean, C-Sharp is essentially exactly the same, except in minor details, but that mixture of object oriented programming and a rich class library is a good tool set to work with. And of course the tools on top, we use IntelliJ, make it immensely more productive. They solve many of the, you know, verbosity of problems the boiler plate problems that you get in any programming language, so.

[01:01:52] Yegor: Including XSLT.

[01:02:00] Michael: Yeah, and I wish IntelliJ had better support for XSLT, but.

[01:02:08] Yegor: But you can do some debugging there.

[01:02:09] Michael: You can, yes, but Oxygen does it much better. Partly because Oxygen works with us closely and redistributes our product and we have a good relationship with Oxygen. We’ve never established a relationship with IntelliJ, with JetBrains. Well, I’ve tried (chuckles).

[01:02:31] Yegor: Okay. Okay, my last question, Michael, do you need any help from volunteers for your projects or you have enough people in the team and that’s it?

[01:02:44] Michael: I’m not good at managing volunteers, let’s put it that way. Volunteers do need a lot of managing. We’ve had some useful contributions over the years in code and ideas and tests but very often the volunteers only do the fun bit of the work and they leave us to do the boring bits. So the number of times people have suggested a code change and I’ve said, yes, are you going to send us the tests? (laughing loudly) Or the documentation (chuckles)? And then you get this little blank look, (exclaims), tasks. (laughing loudly) It doesn’t help but in the past, we didn’t publish the test frameworks so we made it difficult, but yes, I regard programming as a professional engineering discipline. I don’t want to work in this sort of field where it’s being done by amateurs in their own time who aren’t paid and don’t necessarily share your objectives and aren’t working to your timelines and things like that, the sort of vast volunteer initiatives like Firefox, I don’t know how those work, I’ve never participated in them. I can’t imagine how you can produce a decent bit of engineering in that sort of environment. So, yeah, but I mean the best contribution people, the best contributions we get are people testing the product when it comes out and sending us good bug reports. That’s immensely valuable and people don’t believe it when I say I actually appreciate it when people send in bug reports, I do, I love it. It’s an enormous contribution to the reliability of the product.

[01:04.58] Yegor: Oh yeah, I will send you a bug reports instead of posting questions on Stack Overflow. (laughing loudly) Because this is what I do now, is just go to Stack Overflow, I consider it a Stack Overflow as a sort of a bug reporting place, instead of.

[01:05:10] Michael: Yeah, it’s more than that, I mean, it’s also for many people, a help site. It’s a substitute for the fact that your colleagues don’t work at the next desk to you, the questions you would have asked, you know, to the person at the next desk, you now ask on Stack Overflow, but yes, it’s also a bug reporting site. It’s not actually a good way of managing bugs because there’s no way of saying, you know, give me a list of the open bugs and what their status is and how long they’ve been open. So it’s, it doesn’t give that kind of management, but yeah, so if someone reports something on Stack Overflow, we transfer it to our bug reporting system and manage it that way, which works quite well.

[01:06:03] Yegor: Okay, thank you very much for coming. I only can wish you really good luck in the XSLT development, it seems that we are in the difficult time right now, so we definitely all love XSLT 1.0, XSLT 2.0, but (laughs), the future is not as clear as it seems, right, as it has to be.

[01:06:22] Michael: Well, thanks very much. It’s been fun talking about the wider issues and I hope people are listening and get some insights from that.

[01:06:32] Yegor: Definitely, thanks a lot.

[01:06:34] Michael: Okay, cheers, bye.

[01:06:36] Yegor: Bye-bye, bye.

sixnines availability badge   GitHub stars