Thursday 30 December 2021

Data analysis to find simple investing strategy

Data analysis to find a simple investing strategy for decent returns by testing how well the stock market follows the Pareto principle.


Data source - (Dec - 30 - 2021)  5-year return on equity (ROE) data exported from the paid version of tickertape.in for the companies listed in Nifty 500. 


Histogram for visualization showing number of companies vs percentage return.

Mean14.6%
Median14.88%
Standard Deviation12.88%

Above 28.9%377.4%
Below 5%7815.6%
between 5% - 28.9%38677%
ConclusionPareto Principle can be followed. 20% losers, and 80% winners in the portfolio can give an average 15% annual return in long term investing. 


Optimization can be done based on various factors but that may not be necessarily useful. 

For a hypothetical case of having the highest return possible on a portfolio of 15 stocks containing only top performers is 53% but surely finding such multi-baggers may be very difficult and even on making a dedicated effort on analysis only 1-2 such may be found for a portfolio of 15 stocks.

Note - Nifty 50 ETF gave 16% ROE (5 Y Avg)


PS - when insurance companies or big pharma may be working on insurance premium or drug pricing etc, data for normal distribution in a similar way on a representative population may become highly useful. Use and harm because of data depends on the purpose it's used for i.e. to maximize profit or to maximize access or somewhere in the middle. 


Wednesday 29 December 2021

Impact of Open-Access De-identified case records

While current efforts are towards adoption and interoperability to enable patient data access 24*7 globally as longitudinal record for their life journey, there isn't any technologically feasible and safe way but only debates and tiny projects around open access de-identified data which someday may get utilized at the global scale (even useful if scale is not so huge). The low hanging fruit may be firstly enabling the access to all research papers globally, or all data of clinical trials for better critical appraisal or atleast reporting of outcomes of all trials including those with negative outcomes. The most urgent goal is surely global equitable access of covid vaccines. The paragraph below is taken from project working on a real world scenario to alleviate human suffering with help of open access de-identified data, not in healthcare directly neither global scale but still surely useful. 


"In almost every state, courts can jail people who fail to pay fines, fees, and other court debts—even those resulting from traffic or other non-criminal violations. While imprisoning someone for failing to pay a debt remains illegal on paper, these aggressive debt-enforcement tactics have led to the de facto reemergence of debtors’ prisons. Many believe that thousands of people across the country are jailed each year for unpaid fines and fees, but a dearth of data has made it difficult to rigorously assess and curb modern-day debt imprisonment practices. To address this data gap, we’re compiling an extensive database documenting debt imprisonment. Ultimately, we will anonymize the data and publish them for researchers, civil rights advocates, law enforcement officers, and other criminal justice stakeholders." 

ref - https://law.stanford.edu/event/stanford-computational-policy-lab-debtors-prisons-project/ 

Debtor's Prison - https://en.wikipedia.org/wiki/Debtors%27_prison



Hopefully someday open access de-identified health data will help others to get freedom from the disease prison.

Cardiology - NNT visualization

NNT - Number needed to treat

explanation here - https://www.thennt.com/thennt-explained/

ref for data - https://www.thennt.com/home-nnt/

The NNT data available on thennt.com for cardiology speciality, visualized on logarithmic scale to give insight about various interventions. No bar drawn in case of none helped by the drug in given scenario based on available evidence. Purpose for drawing the visualizations was to understand the realistic range/expectation of effects from drugs. 


Insights:

- 18 out of 48 have no effect.

- 3 have NNT 5 of less

- 6 have NNT 6 to 10

- 12 have NNT 11 to 50

- 6 have NNT 51-100

- 4 have NNT above 101

- 1 have NNT above 1500 i.e. Aspirin to prevent first heart attack or stroke.



fig 1 - image




fig 2 - Original image


download excel sheet ->



sorted list ->

Friday 24 December 2021

OpenEHR with FHIR for more power

 FHIR - Fast Healthcare Interoperability Resources is getting adoption at tremendous speed and hence enabling scope for successful nationwide interoperability while also being easy to get started and implement. OpenEHR also solves the interoperability challenge as one of its features. While FHIR adoption and ecosystem is maturing in India, interoperability problem may be best tackled with it.

Combining OpenEHR with FHIR and clinical terminologies (eg. LOINC, SNOMED CT) is a possibility to build more powerful electronic medical records because of the Archetypes in OpenEHR. 

"Archetypes define the possible clinical content, representing a model that originates from actual clinical practice. Archetypes have a governance boundary around them, essentially representing a clinical sign off that can be used to support the idea of a clinical data standard. One or more archetypes can then be used in a template that represents a specific clinical use case. A simple example could be the development of a Body Mass Index (BMI) app. In order to achieve this, we need three principle archetypes to capture height, weight and the BMI itself. The benefit of using these standard archetypes is that they can also be inserted into other templates that require the same clinical data. And when that data is stored in a common way, it can be queried and reused, reducing the burden of repeated data entry." 1

"Reuse of data  - A key attribute of any clinical data repository is its ability to facilitate reuse for additional clinical requirements and for audit and reporting purposes. The Archetype Query Language exists to support this. AQL provides a means of performing queries on the CDR for individual or multiple records at either the patient or archetype level, maintaining data provenance and exporting for more advanced analysis where needed."1


While clinical terminologies help in analysing data, AQL provides another dimension of data analysis i.e. at patient/archetype level in much easier way.

fig 1 - example archetype


Conclusion -

Design choice of :

- using FHIR for Interoperability (across the globe). 

- using OpenEHR Archetypes for data storage (writing to database and reusing in templates)

- AQL on CDR and clinical terminologies for more powerful analytics.

fig 2 - FHIR + OpenEHR



references -

1- https://echalliance.com/what-is-openehr-and-why-is-it-important/

fig - https://ckm.openehr.org/ckm/archetypes/1013.1.3574/mindmap

fig - https://www.youtube.com/watch?v=biEXVRzjWmw&t=841s&ab_channel=openEHR