Hamlet certainly had a lot on his mind, and perhaps for that reason might not have made a good software salesman. However, as a software developer we should look positively at his desire to explore his alternatives. Even within a particular paradigm (let’s restrict ourselves to “data pipelining”) there is not only one way to do it.
If we look via the web to our shared knowledge about data pipelining (Reference 1 is a good starting point, and there are certainly many links from there) we can find a huge variety of implementations, going back well over 50 years. The “visual programming” paradigm didn’t in fact come along late in this game (1966 is the date of one reference which I have found), though it has certainly taken off much more steeply in the last five years.
The concept of a network of reusable components which can be visually joined up into a pipeline and configured to do a particular task, can be taken to the extreme. Indeed, the author’s “favorites” bar in his data pipelining tool of choice, BIOVIA Pipeline Pilot, shown below, reveals a touch of idolatry.
What are the Components?
The two circled components are represented not by their names but simply by symbols. They become iconic – their actions are represented by the abstract concepts of filtering (to split the data pipeline) and manipulating the data on the pipeline.
Yet these are script-based components (their proper names are “Custom Manipulator (PilotScript)” and “Custom Filter (PilotScript)”). The first allows any script-based manipulation of the data flowing down the pipeline, and the second allows for similar script, but with a final statement evaluating to a Boolean, which splits the pipeline into two.
They both offer the user a rather nice expression editor with everything you would expect (syntax highlighting, intellisense, find & replace, etc.). In the figure below we show the expression editor on hitting a hotkey which reminds you which properties are available at that point on the data pipeline:
This particular component applies PilotScript, the native scripting language of Pipeline Pilot. But if Python is your bag, or Java, or whatever, a similar component can be found.
But isn’t this sacrilege? Should we be writing script when we instead could use correctly-configured components to do what we need? “Must give us pause”.
An example should help. We recently helped someone “munge” some data, which had the usual hallmarks of a personally-invented data format, never to be reencountered (groups of records separated by a blank line, strange JSON-like ‘[‘ and ‘]’ for no apparent reason, etc.).
The first step was to do some “extreme programming” with the data and some grouping components, but we soon realized Pipeline Pilot allows you to script as well, and in minutes we had exactly what we were looking for.
Of course, what makes this work is the functions or “methods” available in the scripting language. This is why we script for big hitters like Java and Python. But PilotScript is also well-supplied with functions, from sets of functions for manipulating the (possibly hierarchical) data records, through sets for manipulating chemistry, to sets for manipulating the components themselves (and therefore “programming on the visual language”, a surprisingly useful capability).
And that is how it should be. As developers, we have a good knowledge of our tools and skills and should be able to make choices on the spur of the moment rather than adhere to any paradigm. As Hamlet quoted, don’t get “sicklied o’er, with the pale cast of thought.” Trust yourself, and grab the tool you instinctively feel is right. Just make sure that you have a complete toolbox.