Recently I have been working on building a batch processing system for weblicht.
Weblicht is a orchestration tool for running a pipeline of linguistic tools.
The problem with current weblicht is that it can only process files with relative small size.
All of the linguistic tools are wrapped as Restful webservices.
Weblicht calls services sequentially according to a pipeline constructed by the user using the GUI interface.
There are some limitations in this architecture.
- The call to some webservices in the pipeline may fail. In this case, the results of services which have been already called are discarded as well, though those might be to the users’ interest.
- All of the services are run sequentially. The downstream services waste a lot of time waiting for upstream services processing the whole file. What might be a better solution is that, the large file can be processed part by part. As soon as one part is processed by a service successfully, It can return the result back to weblicht, then the next service can get the data from weblicht to start processing. In the meantime the previous service can work on the remaining parts. This can increase the throughput of the system.
- Other limitations such as that the browser session can time out before the processing is finished; users need to keep their browser open.(This is solved by WaaS, weblicht as a webservice, users can send a chain and files to WaaS, but this requires some minimal programming skills which our target users may not have ).
To address these issues, I have been working on a new browser based application.
Files in webba need to be properly saved. Files include those directly uploaded by users and those intermediate results.
Any user should only have access to files uploaded by them or intermediate result files generated by their task.