[Software] Re: [EXTERN] Re: Software manuscript / BLS microscopy

Feb. 19, 2025


      Hi Pierre,
thanks for your reply.
Regarding the file structure, what I am still not very clear about is what you define as Raw_data and data, how the actual spectra are stored and how the association to spatial coordinates and parameters is made. That's why I would really appreciate if you could make a document similar to what I did (https://github.com/prevedel-lab/Brillouin-standard-file/blob/main/docs/Bh5_f...), where it is clear what each group should (or can) contain, what is the shape of each dataset, which attributes they have, etc...
I am not saying that this should be the final structure of the file but I strongly believe that having it written in a structured way helps defining it and seeing potential issues.
Regarding the webapp, it works by separating the frontend, which run in the browser and is responsible of running the GUI, and the backend which is doing the actual computation and that you can structure as you like with multithreading or any optimization you wish. The communication between frontend and backend is handled transparently by the framework, so you don't have to take care of it.
When you run Dash locally a local server is created so the data stays on your computer and there is no issue with data transfer/privacy.
If we want to move it to a server at a later stage, then you are right about the fact that data needs to be transferred to a server (although there might be solutions that allow you to still run the computation on the local browser, like WebAssembly (https://webassembly.org/), but I think it is too complex to look into this at this stage). Regarding safety and privacy, the data will be stored on the server only for the time that the user is using the app and then deleted (of course we will need to make a disclaimer on the website about this). Regarding space I don't think people at the beginning will load their 1Tb dataset on the webapp but rather look at some small dataset of few tens of Gb; in that case 1Tb of server space is more than enough (remember that the file will be stored only for the time the user is using the app). If people really start to use the webapp and space becomes a limitation, I would be super happy to look into WebAssembly so that the data can be handled locally from the browser without transferring it to the server.
To summarize I would agree that, at this stage, there might be no advantage of a webapp over a local GUI (and might actually be a bit more work to develop it). But, if this project really flies, the potential of a webapp is huge, especially in the context of the BioBrillouin society where we want to have a database of spectral data and the webapp could be used to explore that without downloading anything on your computer. My main point is that moving now to a webapp would not be too much work, but if we want to do it at a later stage it will basically entail re-writing everything from scratch.
Let me know what are your thoughts about it.
Best,
Carlo
On Wed, Feb 19, 2025 at 08:51, Pierre Bouvet  wrote:
 Hi Carlo,
You’re right, the format is still a little fuzzy. The main idea is to have a data-based hierarchy for storage: each data is stored in an individual group. From there, abscissa dependent on the data are stored in the same group and the treated data are stored in sub-groups. The names of the groups and sub-groups are fixed (Data_i), as is the name of the raw data (Raw_data) and abscissa (Abscissa_i). Also, the measure and spectrometer-dependent arguments have a defined nomenclature. To differentiate groups between them, we preferably attribute them a name as an attribute that is not used as an identifier, the identifier being either the path to the data/group from file root (Data_0/Data_42/Data_2/Raw_data) or potentially a hash value (not implemented). This I think allows all possible configurations, and using an hierarchical approach we can also pass common attributes and arrays (parameters of the spectrometer or abscissa arrays for example) on parent to reduce memory complexity.
Now regarding the use of server-based GUI, first off, I’ve never used them so I’m just making supposition here but if the platform is able to treat data online, it will have to store the spectra somewhere  in memory and it will not be local. My primary concerns with this approach is safety, particularly in terms of intellectual property protection, ethical considerations, and potential security vulnerabilities. Additionally, managing storage space will be a headache.
Now, this doesn’t mean that we can’t have a server-based data plotter, or a server-based interface that does treatments locally but I don’t really see the benefits of this over a local software that can have multiple windows, which could at one point be multithreaded, and that could wrap c code to speed regressions for example (some of this might apply to Dash, I’m not super familiar with it). Now regarding memory complexity, having all the data we treat go on a server is a bad idea as it will raise the question of cost which will very rapidly become a real problem. Just an example: for a 100x100 map, I need 10Gb of memory (1Mo/point) with my setup (1To of archive storage is approximately 1euro/month) so this would get out of hands super fast assuming people use it.
Now maybe there are solutions I don’t see for these problems and someone can take care of them but for me it’s just not worth the effort when having a local GUI solves all of these issues at once. But if you find a solution, then I’ll be happy to migrate to Dash, it won’t be fast to translate every feature but I can totally join you on the platform.
Best,
Pierre
Pierre Bouvet, PhD
Post-doctoral Fellow
Medical University Vienna
Department of Anatomy and Cell Biology
Wahringer Straße 13, 1090 Wien, Austria
On 18/2/25, at 20:26, Carlo Bevilacqua  wrote:
Hi Pierre,
regarding the structure of the file, I agree that we should keep it simple. I am not suggesting to make it more complex, rather to have a document where we define it in a clear and structured way rather than as a general description, to make sure that we are all on the same page.
I am saying this because, to be honest, I am not sure I fully understand how you are structuring the HDF5 and I think it is important to agree on this now, rather than finding at a later stage that we had different ideas.
Ideally if the GUI should be part of a single application, we should write it using a unified framework.
The reasons why, after considering it for a bit, I lean towards Dash rather than Qt are:
	* it can run in a web browser so it will be easy to eventually move it to a website, thus people can use it without installing it (which will hopefully help in promoting its use) 
	* it is based on plotly (https://plotly.com/python/) which is a graphical library with very good plotting capabilites and highly customizable, that would make the data visualization easier/more appealing
Let me know what you think about it and if you see any advantage of Qt over a web app which I am not considering. Also, if we agree that Dash might be a better option, would you consider migrating to Dash? It might not be too much work if most of the code you wrote is for data processing rather that for the GUI itself and I could help you with that. Alternatively one workaround is to have a QtWebEngine (https://doc.qt.io/qtforpython-6/overviews/qtwebengine-overview.html) in your Qt app and have the Dash app run inside that, but that would make only the Dash part portable to a server later on.
Best,
Carlo
On Tue, Feb 18, 2025 at 10:24, Pierre Bouvet  wrote:
 Hi, 
Thanks,
More than merge them later, just keep them separate in the process and rather than trying to build "one code to do it all", build one library and GUI that encapsulate codes to do everything. 
Having a structure is a must but it needs to be kept as simple as possible else we’ll rapidly find situations where we need to change the structure. The middle ground I think is to force the use of hierarchies and impose nomenclatures where problem are expected to appear: each dataset has its own group, each group can encapsulate other groups, each parameter of a group applies to all its sub-groups if the subgroup does not change its value, each array of a group applies to all of its sub-groups if the sub-group does not redefine an array with same name, the names of the groups are held as parameters and their ID are managed by the software. 
For the Plotly interface, I don’t know how to integrate it to Qt but if you find a way to do it, that’s perfect with me :)
My vision of the GUI is something that unifies the format and can then execute scripts on the data stored in this format. I have pushed the application of the GUI to my data up to the point where I can do the whole information extraction process from it (add data, convert to PSD, fit curves). I think this could be super interesting if we implement it for all techniques. 
I think we could formalize the project in milestones, in particular define the minimal denominator that will allow us to have the paper. The milestone I reached for my spectro encompasses the following points:
- Being able to add data to the file easily (by dragging & dropping to the GUI)
- Being able to assign properties to these data easily (again by dragging & dropping)
- Being able to structure the added data in groups/folders/containers/however we want to call it
- Making it easy for new data types to be loaded
- Allowing data from same type but different structure to be added (e.g. .dat files)
- Execute scripts on the data easily and allowing parameters of these scripts to be defined from the GUI (e.g. select a peak on a curve to fit this peak)
- Make it easy to add scripts for treating or extracting PSD from raw data. 
- Allow the export of a Python code to access the data from the file (we cans see them as “break points” in the treatment pipeline)
- Edit of properties inside the GUI
In any case I think we could build a spec sheet for the project with what we want to have in it based on what we want to advertise in the paper. We can always add things later on but if we agree on a strict minimum to have the project advertised, then that will set its first milestone on which we’ll be able to build later on. 
Best,
Pierre
Pierre Bouvet, PhD
Post-doctoral Fellow
Medical University Vienna
Department of Anatomy and Cell Biology
Wahringer Straße 13, 1090 Wien, Austria
On 17/2/25, at 14:53, Carlo Bevilacqua via Software  wrote:
Hi Pierre, hi Sal,
thanks for sharing your thoughts about it.

@Pierre I am very sorry that Ren passed away :(
As far as I understood you are suggesting to work on the individual aspects separately and then merge them together at a later stage? I am fine with that but I still think it is very important to now agree on what should be in the file format we are defining, because that is kind of the core of the project and will affect a lot of the design choices.
I am happy to start working on the GUI for data visualization. In my idea, it will be something similar to this (https://www.nature.com/articles/s41592-023-02054-z/figures/10) but written in dash (https://dash.plotly.com/), so it is a web app that for now can run locally (so no transfer of data to an external server is required) but in the future can easily be uploaded to a website that people can just use without installing anything on their computer.
The question is how much of the data processing should be possible to do (or trigger) from the same GUI? I think at least a drop down menu with different fitting functions should be there, so the user can quickly check the results on a spectrum from a specific point in the image. 
@Pierre how are you envisioning the GUI you are working on? As far as I understood it is mainly to get the raw data and save it to our HDF5 format with some treatment on it.
One idea could be to have a shared list of features that we are implementing in the GUI as we work on it, to avoid having the same functionality duplicated (or worst inconsistent) between the GUI for generating our file format and the GUI for visualization. 
Let me know what you think about it.
Best,
Carlo
On Mon, Feb 17, 2025 at 11:04, Pierre Bouvet via Software  wrote:
 Hi everyone,
Sorry for the delayed response, I just went through the loss of Ren (my dog) and wasn’t at all able to answer. 
First of, Sal, I am making progress and I should have everything you have made on your branch integrated to the GUI branch soon, at which point I will push everything to main. 
Regarding the project, I think I align with Sal: keep things as simple as possible. My approach was to divide everything in 3 mostly independent layers:
- Store data in an organized file that anyone can use, and make it easy to do
- Convert these data into something that has physical significance: a Power Spectrum Density
- Extract information from this Power Spectrum Density
Each layer has its own challenge but they are independent on the challenge of having people using it: personally, if I had someone come to me with a new software to play with my measures, I would most likely only look at it for 1 minute (or less depending who made it) with a lot of apprehension and then based on how simple it looks and how much I understood of it, use it or - what is most likely - discard it. This is why Project.pdf (https://github.com/bio-brillouin/HDF5_BLS/blob/GUI_development/guides/Projec...) is essentially meant to say: we put data arrays in groups and give them attributes written in text, but you can also create groups within groups to organize your data!
I believe most of the people in the BioBrillouin society will have the same approach and before having something complex that can do a lot, I think it’s best to have something simple that can just unify the format (which in itself is a lot) and that people can use right away with a minimal impact on their own individual pipelines for data processing. 
To be honest, I don’t think people will blindly trust our software at first to treat their data, but they will most likely use it at first to organize their raw spectra, and maybe add their own treated data if they are feeling particularly adventurous. But that is not a problem as long as we start a movement. From there yes, we can complexity and create custom file architectures but that is already too complex I think for this first project. 
If we can all just import our data to the wrapper and specify their attributes, this will already be a success. Then for the paper, if we can all add a custom code to treat these data to obtain a PSD, I think this will be enough. As I said, the current API already allows you to add your data in a variety of format, so it’s just a question of developing the PSD conversion before having something we can publish (and then tell people to use). 
A few extra points: 
- Working with my setup, I realized if the user could choose between different algorithms to treat their own data, it would make the overall project much more usable (if an algorithm bugs for whatever reason, you can use another one) so I created 2 bottle necks in the form of functions (for PSD conversion and treatment) that would inspect modules dedicated to either PSD conversion or treatment, and list existing functions. It’s in between classical and modular programming but it makes the development of new PSD conversion code and treatment way way easier 
- The GUI is developed using oriented-object programming. Therefore I have already made some low-level choices that kind of impact all the GUI. I’m not saying they are the best choices, I’m just saying that they work, so if you want to work on the GUI, I would either recommend getting familiar with these choices, or making sure that all the functionalities are preserved, particularly the ones that are invisible (logging of treatment steps, treatment errors…)
I’ll try merging the branches on Git asap and will definitely send you all an email when it’s done :)
Best,
Pierre
Pierre Bouvet, PhD
Post-doctoral Fellow
Medical University Vienna
Department of Anatomy and Cell Biology
Wahringer Straße 13, 1090 Wien, Austria
On 14/2/25, at 19:36, Sal La Cavera Iii via Software  wrote:

Hi all,
I agree with the things enumerated and points made by Kareem/Carlo! I guess I would advocate for starting from a position of simplicity. We probably don't want to get weighed down by trying to include a bunch of bells and whistles from the start as these can be easily added once there is a minimally viable stable product.
As long as data can be loaded to local memory, then treated in various (initially simple) ways through the GUI (and maybe maintain non-GUI compatibility for command-line warriors like myself), then the bh5 formatting stuff can be sorted on the back end? I definitely agree with Carlo's structure set out in the file format document; but all of that data will be floating around the memory in some shape or form and just needs to be wrapped up and packaged (according to Carlo's structure e.g.) when the user is happy and presses the "generate h5 filestore" etc. (?) 
Definitely agree with the recommendation to create the alpha using mainly requirements that our 3 labs would find useful (import filetypes, treatments, etc), and then we can add on more universal functionality second / get some beta testers in from other labs etc.
I'm able to support on whatever jobs need doing and am free to meet in the beginning of March like you mentioned Kareem. 
Hope you guys have a nice weekend,
Cheers,
Sal
---------------------------------------------------------------Salvatore La Cavera IIIRoyal Academy of Engineering Research FellowNottingham Research FellowOptics and Photonics GroupUniversity of Nottingham
Email: salvatore.lacaveraiii@nottingham.ac.uk (mailto:salvatore.lacaveraiii@nottingham.ac.uk)
ORCID iD: 0000-0003-0210-3102

	 (https://outlook.office.com/bookwithme/user/6a3f960a8e89429cb6fc693c01d10119@...)
	Book a Coffee and Research chat with me!
------------------------------------
From: Carlo Bevilacqua via Software 
Sent: 12 February 2025 13:31
To: Kareem Elsayad 
Cc: sebastian.hambura@embl.de (mailto:sebastian.hambura@embl.de) ; software@biobrillouin.org (mailto:software@biobrillouin.org) 
Subject: [Software] Re: Software manuscript / BLS microscopy
	You don't often get email from software@biobrillouin.org (mailto:software@biobrillouin.org). Learn why this is important (https://aka.ms/LearnAboutSenderIdentification)
Hi Kareem,
thanks for restarting this and sorry for my silence, I just came back from US and was planning to start working on this again.
Could you also add Sebastian (in CC) to the mailing list?

As you outlined, I would split the project into two parts: 1) getting from the raw data to some standard "processed' spectra and 2) do data analysis/visualization on that. For the second part the way I envision it is:
	* the most updated definition of the file format from Pierre is this one (https://github.com/bio-brillouin/HDF5_BLS/blob/GUI_development/guides/Projec...), correct? In addition to this document I think it would be good to have a more structured description of the file (like this (https://github.com/prevedel-lab/Brillouin-standard-file/blob/main/docs/Bh5_f...)), where each field is clearly defined in terms of data type and dimensions. Sebastian was also suggesting that the document should contain the reasoning behind each specific choice in the specs and also things that we considered but decided had some issues (so in future we can look back at it). I still believe that the file format should contain the "processed" spectra (i.e. after FT and baseline subtraction for impulsive, or after taking a line profile and possibly linearization with VIPA,...) so we can apply standard data processing or visualization which is independent on the actual underlaying technique (e.g. VIPA, FP, stimulated, time domain, ...)
	* agree on an API to read the data from our file format (most likely a Python class). For that we should: 1) decide on which information is important to extract (spectral information, spatial coordinates, hyper-parameters, metadata,...) 2) implement an interface to read the data (e.g. readSpectrumAtIndex(...), readImage(...), ...)
	* build a GUI that use the previously defined API to show and process the data.
I would say that, once we have a very solid description of the file format as in step 1, step 2 will come naturally and we can divide the actual implementation between us. Step 3 can also easily be implemented and divided between us once we have an API (and I am happy to work on the GUI myself). 
The first half of the project is the trickiest and I know Pierre has already done a lot of work in that direction. We should definitely agree on which extent we can define a standard to store the raw data, given the variability between labs, (and probably we should do it for common techniques like FP or VIPA) and how to implement the treatments, leaving to possibility to load some custom code in the pipeline to do the conversion.
Let me know what you all think about this.
If you agree I would start by making a document which clearly defines the file format in a structured way (as in my step 1 before). @Pierre could you write a new document or modify the document I originally made (https://github.com/prevedel-lab/Brillouin-standard-file/blob/main/docs/Bh5_f...) to reflect the latest structure you implemented for the file format? The metadata can still be defined in a separated excel sheet, as long as the data type and format is well defined there.
Best regards,
Carlo
On Wed, Feb 12, 2025 at 02:19, Kareem Elsayad via Software  wrote:

Hi Robert, Carlo, Sal, Pierre,
Think it would be good to follow up on software project. Pierre has made some progress here and would be good to try and define tasks a little bit clearer to make progress…
There is always the potential issue of having “too many cooks in the kitchen” (that have different recipes for same thing) to move forward efficiently, something that I noticed can get quite confusing/frustrating when writing software together with people. So would be good to clearly assign tasks. I talked to Pierre today and he would be happy to integrate things in framework we have to try tie things together. What would foremost be needed would be ways of treating data, meaning code that takes a raw spectral image and meta-data and converts it into “standard” format (spectral representation) that can then be fitted. Then also “plugins” that serve a specific purpose in the analysis/rendering that can be included in framework.
The way I see it (and please comment if you see differently), there are ~4 steps here:
	* Take raw data (in .tif,, .dat, txt, etc. format) and meta data (in .cvs, xlsx, .dat, .txt, etc.) and render a standard spectral presentation. Also take provided instrument response in one of these formats and extract key parameters from this

	* Fit the data with drop-down menu list of functions, that will include different functional dependences and functions corrected for instrument response.

	* Generate/display a visual representation of results (frequency shift(s) and linewidth(s)), that is ideally interactive to some extent (and maybe has some funky features like looking at spectra at different points.  These can be spatial maps and/or evolution with some other parameter (time, temperature, angle, etc.). Also be able to display maps of relative peak intensities in case of multiple peak fits, and whatever else useful you can think of.

	* Extract “mechanical” parameters given assigned refractive indices and densities
I think the idea of fitting modified functions (e.g. corrected based on instrument response) vs. deconvolving spectra makes more sense (as can account for more complex corrections due to non-optical anomalies in future –ultimately even functional variations in vicinity of e.g. phase transitions). It is also less error prone, as systematically doing decon with non-ideal registration data can really throw you off the cliff, so to speak.
My understanding is that we kind of agreed on initial meta-data reporting format. Getting from 1 to 2 will no doubt be most challenging as it is very instrument specific. So instructions will need to be written for different BLS implementations.
This is a huge project if we want it to be all inclusive.. so I would suggest to focus on making it work for just a couple of modalities first would be good (e.g. crossed VIPA, time-resolved, anisotropy, and maybe some time or temperature course of one of these). Extensions should then be more easy to navigate. At one point think would be good to involve SBS specific considerations also.
Think would be good to discuss a while per email to gather thoughts and opinions (and already start to share codes), and then plan a meeting beginning of March -- how does first week of March look for everyone?
I created this mailing list (software@biobrillouin.org (mailto:software@biobrillouin.org)) we can use for discussion. You should all be able to post to (and it makes it easier if we bring anyone else in along the way).
At moment on this mailing list is Robert, Carlo, Sal, Pierre and myself. Let me know if I should add anyone.
All the best,
Kareem
This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please contact the sender and delete the email and attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. Email communications with the University of Nottingham may be monitored where permitted by law. _______________________________________________
Software mailing list -- software@biobrillouin.org (mailto:software@biobrillouin.org)
To unsubscribe send an email to software-leave@biobrillouin.org (mailto:software-leave@biobrillouin.org)
  _______________________________________________
Software mailing list -- software@biobrillouin.org (mailto:software@biobrillouin.org)
To unsubscribe send an email to software-leave@biobrillouin.org (mailto:software-leave@biobrillouin.org)