JabRef references

Li C, Gupta S, Rana S, Nguyen V, Venkatesh S and Shilton A (2017), "High Dimensional Bayesian Optimization Using Dropout", In International Joint Conference on Artificial Intelligence. , pp. 2096-2102.

[Abstract] [BibTeX]

Abstract: Scaling Bayesian optimization to high dimensions is challenging task as the global optimization of high-dimensional acquisition function can be expensive and often infeasible. Existing methods depend either on limited “active” variables or the additive form of the objective function. We propose a new method for high-dimensional Bayesian optimization, that uses a dropout strategy to optimize only a subset of variables at each iteration. We derive theoretical bounds for the regret and show how it can inform the derivation of our algorithm. We demonstrate the efficacy of our algorithms for optimization on two benchmark functions and two realworld applications - training cascade classifiers and optimizing alloy composition.

BibTeX:

@inproceedings{cheng2017high,
  author = {Li, Cheng and Gupta, Sunil and Rana, Santu and Nguyen, Vu and Venkatesh, Svetha and Shilton, Alistair},
  title = {High Dimensional Bayesian Optimization Using Dropout},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year = {2017},
  pages = {2096--2102}
}

Dai Nguyen T, Gupta S, Rana S and Venkatesh S (2017), "Stable Bayesian Optimization", In Pacific-Asia Conference on Knowledge Discovery and Data Mining. , pp. 578-591. Springer.

[BibTeX]

BibTeX:

@inproceedings{dai2017stable,
  author = {Dai Nguyen, Thanh and Gupta, Sunil and Rana, Santu and Venkatesh, Svetha},
  title = {Stable Bayesian Optimization},
  booktitle = {Pacific-Asia Conference on Knowledge Discovery and Data Mining},
  publisher = {Springer},
  year = {2017},
  pages = {578--591}
}

Dao B, Nguyen T, Venkatesh S and Phung D (2017), "Latent sentiment topic modelling and nonparametric discovery of online mental health-related communities", International Journal of Data Science and Analytics. Vol. 4(3), pp. 209-231. Springer.

[Abstract] [BibTeX]

Abstract: Data-driven scientific discovery is a key emerging paradigm driving research innovation and industrial development in domains such as business, social science, the Internet of Things, and cloud computing. The field encompasses the larger areas of data analytics, machine learning, and managing big data, while related new scientific challenges range from data capture, creation, storage, search, sharing, analysis, and visualization, to integration across heterogeneous, interdependent complex resources for real-time decision-making, collaboration, and value creation. The journal welcomes experimental and theoretical findings on data science and advanced analytics along with their applications to real-life situations.

BibTeX:

@article{dao2017latent,
  author = {Dao, Bo and Nguyen, Thin and Venkatesh, Svetha and Phung, Dinh},
  title = {Latent sentiment topic modelling and nonparametric discovery of online mental health-related communities},
  journal = {International Journal of Data Science and Analytics},
  publisher = {Springer},
  year = {2017},
  volume = {4},
  number = {3},
  pages = {209--231}
}

Karmakar C, Udhayakumar RK, Li P, Venkatesh S and Palaniswami M (2017), "Stability, consistency and performance of distribution entropy in analysing short length heart rate variability (hrv) signal", Frontiers in Physiology. Vol. 8, pp. 720. Frontiers.

[Abstract] [BibTeX] [DOI] [URL]

Abstract: Distribution entropy (DistEn) is a recently developed measure of complexity that is used to analyse heart rate variability (HRV) data. Its calculation requires two input parameters—the embedding dimension m, and the number of bins M which replaces the tolerance parameter r that is used by the existing approximation entropy (ApEn) and sample entropy (SampEn) measures. The performance of DistEn can also be affected by the data length N. In our previous studies, we have analyzed stability and performance of DistEn with respect to one parameter (m or M) or combination of two parameters (N and M). However, impact of varying all the three input parameters on DistEn is not yet studied. Since DistEn is predominantly aimed at analysing short length heart rate variability (HRV) signal, it is important to comprehensively study the stability, consistency and performance of the measure using multiple case studies. In this study, we examined the impact of changing input parameters on DistEn for synthetic and physiological signals. We also compared the variations of DistEn and performance in distinguishing physiological (Elderly from Young) and pathological (Healthy from Arrhythmia) conditions with ApEn and SampEn. The results showed that DistEn values are minimally affected by the variations of input parameters compared to ApEn and SampEn. DistEn also showed the most consistent and the best performance in differentiating physiological and pathological conditions with various of input parameters among reported complexity measures. In conclusion, DistEn is found to be the best measure for analysing short length HRV time series.

BibTeX:

@article{karmakar2017stability,
  author = {Karmakar, Chandan and Udhayakumar, Radhagayathri K and Li, Peng and Venkatesh, Svetha and Palaniswami, Marimuthu},
  title = {Stability, consistency and performance of distribution entropy in analysing short length heart rate variability (hrv) signal},
  journal = {Frontiers in Physiology},
  publisher = {Frontiers},
  year = {2017},
  volume = {8},
  pages = {720},
  url = {https://www.frontiersin.org/articles/10.3389/fphys.2017.00720/full},
  doi = {10.3389/fphys.2017.00720}
}

Li C, de Celis Leal DR, Rana S, Gupta S, Sutti A, Greenhill S, Slezak T, Height M and Venkatesh S (2017), "Rapid Bayesian optimisation for synthesis of short polymer fiber materials", Scientific reports. Vol. 7(1), pp. 5683. Nature Publishing Group.

[Abstract] [BibTeX] [DOI] [URL]

Abstract: The discovery of processes for the synthesis of new materials involves many decisions about process design, operation, and material properties. Experimentation is crucial but as complexity increases, exploration of variables can become impractical using traditional combinatorial approaches. We describe an iterative method which uses machine learning to optimise process development, incorporating multiple qualitative and quantitative objectives. We demonstrate the method with a novel fluid processing platform for synthesis of short polymer fibers, and show how the synthesis process can be efficiently directed to achieve material and process objectives.

BibTeX:

@article{li2017rapid,
  author = {Li, Cheng and de Celis Leal, David Rub\in and Rana, Santu and Gupta, Sunil and Sutti, Alessandra and Greenhill, Stewart and Slezak, Teo and Height, Murray and Venkatesh, Svetha},
  title = {Rapid Bayesian optimisation for synthesis of short polymer fiber materials},
  journal = {Scientific reports},
  publisher = {Nature Publishing Group},
  year = {2017},
  volume = {7},
  number = {1},
  pages = {5683},
  url = {https://www.nature.com/articles/s41598-017-05723-0},
  doi = {10.1038/s41598-017-05723-0}
}

Nguyen T, Larsen ME, O'Dea B, Nguyen DT, Yearwood J, Phung D, Venkatesh S and Christensen H (2017), "Kernel-based features for predicting population health indices from geocoded social media data", Decision Support Systems. Vol. 102, pp. 22 - 31.

[Abstract] [BibTeX] [DOI] [URL]

Abstract: Abstract When using tweets to predict population health index, due to the large scale of data, an aggregation of tweets by population has been a popular practice in learning features to characterize the population. This would alleviate the computational cost for extracting features on each individual tweet. On the other hand, much information on the population could be lost as the distribution of textual features of a population could be important for identifying the health index of that population. In addition, there could be relationships between features and those relationships could also convey predictive information of the health index. In this paper, we propose mid-level features namely kernel-based features for prediction of health indices of populations from social media data. The kernel-based features are extracted on the distributions of textual features over population tweets and encode the relationships between individual textual features in a kernel function. We implemented our features using three different kernel functions and applied them for two case studies of population health prediction: across-year prediction and across-county prediction. The kernel-based features were evaluated and compared with existing features on a dataset collected from the Behavioral Risk Factor Surveillance System dataset. Experimental results show that the kernel-based features gained significantly higher prediction performance than existing techniques, by up to 16.3%, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels.

BibTeX:

@article{NGUYEN201722,
  author = {Nguyen, Thin and Larsen, Mark E and O'Dea, Bridianne and Nguyen, Duc Thanh and Yearwood, John and Phung, Dinh and Venkatesh, Svetha and Christensen, Helen},
  title = {Kernel-based features for predicting population health indices from geocoded social media data},
  journal = {Decision Support Systems},
  year = {2017},
  volume = {102},
  pages = {22 - 31},
  url = {http://www.sciencedirect.com/science/article/pii/S0167923617301227},
  doi = {10.1016/j.dss.2017.06.010}
}

Nguyen T, Venkatesh S and Phung D (2017), "Academia versus social media: A psycho-linguistic analysis", Journal of Computational Science. , pp. 228-237. Elsevier.

[Abstract] [BibTeX] [URL]

Abstract: Publication pressure has influenced the way scientists report their experimental results. Recently it has been found that scientific outcomes have been exaggerated or distorted (spin) to hopefully be published. Apart from investigating the content to look for spins, language styles has been proven to be the good traces. For example, the use of words in emotion lexicons has been used to interpret exaggeration and overstatement in academia. This work adapts a data-driven approach to explore a comprehensive set of psycho-linguistic features for a large corpus of PubMed papers published for the last four decades. The language features for other media – online encyclopedia (Wikipedia), online diaries (web-logs), online forums (Reddit), and micro-blogs (Twitter) – are also extracted. Several binary classifications are employed to discover linguistic predictors of scientific abstracts versus other media as well as strong predictors of scientific articles in different cohorts of impact factors and author affiliations. Trends of language styles expressed in scientific articles over the course of 40 years has also been discovered, providing the evolution of academic writing for the period of time. The study demonstrates advances in lightning-fast cluster computing on dealing with large scale data, consisting of 5.8 terabytes of data containing 3.6 billion records from all the media. The good performance of the advanced cluster computing framework suggests the potential of pattern recognition in data at scale.

BibTeX:

@article{nguyen2017academia,
  author = {Nguyen, Thin and Venkatesh, Svetha and Phung, Dinh},
  title = {Academia versus social media: A psycho-linguistic analysis},
  journal = {Journal of Computational Science},
  publisher = {Elsevier},
  year = {2017},
  pages = {228--237},
  url = {https://www.sciencedirect.com/science/article/pii/S1877750317309122}
}

Nguyen V, Gupta S, Rana S, Li C and Venkatesh S (2017), "Bayesian optimization in weakly specified search space", In 2017 IEEE International Conference on Data Mining (ICDM). , pp. 347-356.

[BibTeX] [DOI] [URL]

BibTeX:

@inproceedings{nguyen2017bayesian,
  author = {Nguyen, Vu and Gupta, Sunil and Rana, Santu and Li, Cheng and Venkatesh, Svetha},
  title = {Bayesian optimization in weakly specified search space},
  booktitle = {2017 IEEE International Conference on Data Mining (ICDM)},
  year = {2017},
  pages = {347--356},
  url = {http://ieeexplore.ieee.org/abstract/document/8215507/},
  doi = {10.1109/ICDM.2017.44}
}

Nguyen T, Larsen ME, O'Dea B, Phung D, Venkatesh S and Christensen H (2017), "Estimation of the prevalence of adverse drug reactions from social media", International Journal of Medical Informatics. Vol. 102, pp. 130-137. Elsevier.

[Abstract] [BibTeX] [URL]

Abstract: This work aims to estimate the degree of adverse drug reactions (ADR) for psychiatric medications from social media, including Twitter, Reddit, and LiveJournal. Advances in lightning-fast cluster computing was employed to process large scale data, consisting of 6.4 terabytes of data containing 3.8 billion records from all the media. Rates of ADR were quantified using the SIDER database of drugs and side-effects, and an estimated ADR rate was based on the prevalence of discussion in the social media corpora. Agreement between these measures for a sample of ten popular psychiatric drugs was evaluated using the Pearson correlation coefficient, r, with values between 0.08 and 0.50. Word2vec, a novel neural learning framework, was utilized to improve the coverage of variants of ADR terms in the unstructured text by identifying syntactically or semantically similar terms. Improved correlation coefficients, between 0.29 and 0.59, demonstrates the capability of advanced techniques in machine learning to aid in the discovery of meaningful patterns from medical data, and social media data, at scale.

BibTeX:

@article{nguyen2017estimation,
  author = {Nguyen, Thin and Larsen, Mark E and O'Dea, Bridianne and Phung, Dinh and Venkatesh, Svetha and Christensen, Helen},
  title = {Estimation of the prevalence of adverse drug reactions from social media},
  journal = {International Journal of Medical Informatics},
  publisher = {Elsevier},
  year = {2017},
  volume = {102},
  pages = {130--137},
  url = {http://www.sciencedirect.com/science/article/pii/S1386505617300746}
}

Nguyen P, Tran T, Wickramasinghe N and Venkatesh S (2017), "Deepr: A Convolutional Net for Medical Records", IEEE journal of biomedical and health informatics. Vol. 21(1), pp. 22-30. IEEE.

[BibTeX]

BibTeX:

@article{nguyen2017mathtt,
  author = {Nguyen, Phuoc and Tran, Truyen and Wickramasinghe, Nilmini and Venkatesh, Svetha},
  title = {Deepr: A Convolutional Net for Medical Records},
  journal = {IEEE journal of biomedical and health informatics},
  publisher = {IEEE},
  year = {2017},
  volume = {21},
  number = {1},
  pages = {22--30}
}

Nguyen T, Nguyen DT, Larsen ME, O'Dea B, Yearwood J, Phung D, Venkatesh S and Christensen H (2017), "Prediction of Population Health Indices from Social Media using Kernel-based Textual and Temporal Features", In Proceedings of the 26th International Conference on World Wide Web Companion. , pp. 99-107.

[Abstract] [BibTeX] [URL]

Abstract: From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture either health behaviors, such as drinking or smoking, or health outcomes, including mental, physical, and generic health, of the population. Although this kind of information at a population level, such as US counties, is important for local governments to identify local needs, traditional datasets may take years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. In this work, to predict the percentage of adults in a county reporting "insufficient sleep", a health behavior, and, at the same time, their health outcomes, novel textual and temporal features are proposed. The proposed textual features are defined at mid-level and can be applied on top of various low-level textual features. They are computed via kernel functions on underlying features and encode the relationships between individual underlying features over a population. To further enrich the predictive ability of the health indices, the textual features are augmented with temporal information. We evaluated the proposed features and compared them with existing features using a dataset collected from the BRFSS. Experimental results show that the combination of kernel-based textual features and temporal information predict well both the health behavior (with best performance at rho=0.82) and health outcomes (with best performance at rho=0.78), demonstrating the capability of social media data in prediction of population health indices. The results also show that our proposed features gained higher correlation coefficients than did the existing ones, increasing the correlation coefficient by up to 0.16, suggesting the potential of the approach in a wide spectrum of applications on data analytics at population levels.

BibTeX:

@inproceedings{nguyen2017prediction,
  author = {Nguyen, Thin and Nguyen, Duc Thanh and Larsen, Mark E and O'Dea, Bridianne and Yearwood, John and Phung, Dinh and Venkatesh, Svetha and Christensen, Helen},
  title = {Prediction of Population Health Indices from Social Media using Kernel-based Textual and Temporal Features},
  booktitle = {Proceedings of the 26th International Conference on World Wide Web Companion},
  year = {2017},
  pages = {99--107},
  url = {https://dl.acm.org/citation.cfm?id=3054136}
}

Nguyen V, Gupta S, Rana S, Li C and Venkatesh S (2017), "Regret for Expected Improvement over the Best-Observed Value and Stopping Condition", In Asian Conference on Machine Learning. , pp. 279-294.

[Abstract] [BibTeX] [URL]

Abstract: Bayesian optimization (BO) is a sample-efficient method for global optimization of expensive, noisy, black-box functions using probabilistic methods. The performance of a BO method depends on its selection strategy through the acquisition function. Expected improvement (EI) is one of the most widely used acquisition functions for BO that finds the expectation of the improvement function over the incumbent. We derive a sublinear convergence rate for EI. Our analysis is the first to study a stopping criteria for EI to prevent unnecessary evaluations.

BibTeX:

@inproceedings{nguyen2017regret,
  author = {Nguyen, Vu and Gupta, Sunil and Rana, Santu and Li, Cheng and Venkatesh, Svetha},
  title = {Regret for Expected Improvement over the Best-Observed Value and Stopping Condition},
  booktitle = {Asian Conference on Machine Learning},
  year = {2017},
  pages = {279--294},
  url = {http://proceedings.mlr.press/v77/nguyen17a.html}
}

Nguyen T, O'Dea B, Larsen M, Phung D, Venkatesh S and Christensen H (2017), "Using linguistic and topic analysis to classify sub-groups of online depression communities", Multimedia tools and applications. Vol. 76(8), pp. 10653-10676. Springer.

[Abstract] [BibTeX] [URL]

Abstract: Depression is a highly prevalent mental health problem and is a co-morbidity of other mental, physical, and behavioural disorders. The internet allows individuals who are depressed or caring for those who are depressed, to connect with others via online communities; however, the characteristics of these discussions have not yet been fully explored. This work aims to explore the textual cues of online communities interested in depression. A total of 5,000 posts were randomly selected from 24 online communities. Five subgroups of online communities were identified: Depression, Bipolar Disorder, Self-Harm, Grief/Bereavement, and Suicide. Psycholinguistic features and content topics were extracted from the posts and analysed. Machine learning techniques were used to discriminate the online conversations in the depression communities from the other subgroups. Topics and psycholinguistic features were found to be highly valid predictors of community subgroup. Clear discrimination between linguistic features and topics, alongside good predictive power is an important step in understanding social media and its use in mental health.

BibTeX:

@article{nguyen2017using,
  author = {Nguyen, Thin and O'Dea, Bridianne and Larsen, Mark and Phung, Dinh and Venkatesh, Svetha and Christensen, Helen},
  title = {Using linguistic and topic analysis to classify sub-groups of online depression communities},
  journal = {Multimedia tools and applications},
  publisher = {Springer},
  year = {2017},
  volume = {76},
  number = {8},
  pages = {10653--10676},
  url = {https://link.springer.com/article/10.1007/s11042-015-3128-x}
}

Pham T, Tran T, Phung D and Venkatesh S (2017), "Column Networks for Collective Classification", In The AAAI Conference on Artificial Intelligence (AAAI). , pp. 2485-2491.

[BibTeX]

BibTeX:

@inproceedings{pham2017column,
  author = {Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha},
  title = {Column Networks for Collective Classification},
  booktitle = {The AAAI Conference on Artificial Intelligence (AAAI)},
  year = {2017},
  pages = {2485--2491}
}

Pham T, Tran T, Phung D and Venkatesh S (2017), "Predicting healthcare trajectories from medical records: A deep learning approach", Journal of biomedical informatics. Vol. 69, pp. 218-229. Elsevier.

[BibTeX]

BibTeX:

@article{pham2017predicting,
  author = {Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha},
  title = {Predicting healthcare trajectories from medical records: A deep learning approach},
  journal = {Journal of biomedical informatics},
  publisher = {Elsevier},
  year = {2017},
  volume = {69},
  pages = {218--229}
}

Rana S, Li C, Gupta S, Nguyen V and Venkatesh S (2017), "High dimensional bayesian optimization with elastic gaussian process", In International Conference on Machine Learning. , pp. 2883-2891.

[Abstract] [BibTeX] [URL]

Abstract: Bayesian optimization is an efficient way to optimize expensive black-box functions such as designing a new product with highest quality or tuning hyperparameter of a machine learning algorithm. However, it has a serious limitation when the parameter space is high-dimensional as Bayesian optimization crucially depends on solving a global optimization of a surrogate utility function in the same sized dimensions. The surrogate utility function, known commonly as acquisition function is a continuous function but can be extremely sharp at high dimension - having only a few peaks marooned in a large terrain of almost flat surface. Global optimization algorithms such as DIRECT are infeasible at higher dimensions and gradient-dependent methods cannot move if initialized in the flat terrain. We propose an algorithm that enables local gradient-dependent algorithms to move through the flat terrain by using a sequence of gross-to-finer Gaussian process priors on the objective function as we leverage two underlying facts - a) there exists a large enough length-scales for which the acquisition function can be made to have a significant gradient at any location in the parameter space, and b) the extrema of the consecutive acquisition functions are close although they are different only due to a small difference in the length-scales. Theoretical guarantees are provided and experiments clearly demonstrate the utility of the proposed method on both benchmark test functions and real-world case studies.

BibTeX:

@inproceedings{rana2017high,
  author = {Rana, Santu and Li, Cheng and Gupta, Sunil and Nguyen, Vu and Venkatesh, Svetha},
  title = {High dimensional bayesian optimization with elastic gaussian process},
  booktitle = {International Conference on Machine Learning},
  year = {2017},
  pages = {2883--2891},
  url = {http://proceedings.mlr.press/v70/rana17a/rana17a.pdf}
}

Rana S, Gupta S, Venkatesh S and Sutti A (2017), "Systems and methods for making a product" (WO2017173489A1)

[BibTeX] [URL]

BibTeX:

@patent{rana2017systems,
  author = {Rana, Santu and Gupta, Sunil and Venkatesh, Svetha and Sutti, Alessandra},
  title = {Systems and methods for making a product},
  year = {2017},
  number = {WO2017173489A1},
  url = {https://patents.google.com/patent/WO2017173489A1}
}

Saha B, Gupta S, Phung D and Venkatesh S (2017), "Effective sparse imputation of patient conditions in electronic medical records for emergency risk predictions", Knowledge and Information Systems. Vol. 53(1), pp. 179-206. Springer.

[BibTeX]

BibTeX:

@article{saha2017effective,
  author = {Saha, Budhaditya and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha},
  title = {Effective sparse imputation of patient conditions in electronic medical records for emergency risk predictions},
  journal = {Knowledge and Information Systems},
  publisher = {Springer},
  year = {2017},
  volume = {53},
  number = {1},
  pages = {179--206}
}

Saha B, Gupta S, Phung D and Venkatesh S (2017), "A framework for mixed-type multioutcome prediction with applications in healthcare", IEEE journal of biomedical and health informatics. Vol. 21(4), pp. 1182-1191. IEEE.

[BibTeX]

BibTeX:

@article{saha2017framework,
  author = {Saha, Budhaditya and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha},
  title = {A framework for mixed-type multioutcome prediction with applications in healthcare},
  journal = {IEEE journal of biomedical and health informatics},
  publisher = {IEEE},
  year = {2017},
  volume = {21},
  number = {4},
  pages = {1182--1191}
}

Shilton A, Gupta S, Rana S and Venkatesh S (2017), "Regret Bounds for Transfer Learning in Bayesian Optimisation", In Artificial Intelligence and Statistics. , pp. 307-315.

[Abstract] [BibTeX] [URL]

Abstract: This paper studies the regret bound of two transfer learning algorithms in Bayesian optimisation. The first algorithm models any difference between the source and target functions as a noise process. The second algorithm proposes a new way to model the difference between the source and target as a Gaussian process which is then used to adapt the source data. We show that in both cases the regret bounds are tighter than in the no transfer case. We also experimentally compare the performance of these algorithms relative to no transfer learning and demonstrate benefits of transfer learning.

BibTeX:

@inproceedings{shilton2017regret,
  author = {Shilton, Alistair and Gupta, Sunil and Rana, Santu and Venkatesh, Svetha},
  title = {Regret Bounds for Transfer Learning in Bayesian Optimisation},
  booktitle = {Artificial Intelligence and Statistics},
  year = {2017},
  pages = {307--315},
  url = {http://proceedings.mlr.press/v54/shilton17a/shilton17a.pdf}
}

Nguyen T, Nguyen H, Venkatesh S and Phung D (2017), "Estimating Support Scores of Autism Communities in Large-Scale Web Information Systems", In International Conference on Web Information Systems Engineering. , pp. 347-355.

[Abstract] [BibTeX] [URL]

Abstract: Individuals with Autism Spectrum Disorder (ASD) have been shown to prefer communication at a socio-spatial distance. So while rarely found in the real world, autism communities are popular in Web-based forums, convenient for people with ASD to seek and share health related information. Reddit is one such avenue for people of common interest to connect, forming communities of specific interest, namely subreddits. This work aims to estimate support scores provided by a popular subreddit interested in ASD – www.reddit.com/r/aspergers. The scores were measured in both the quantities and qualities of the conversations in the forum, including conversational involvement, emotional, and informational support. The support scores of the subreddit Aspergers was compared with that of an average subreddit derived from entire Reddit, represented by two big corpora of approximately 200 million Reddit posts and 1.66 billion Reddit comments. The ASD subreddit was found to be a supportive community, having far higher support scores than did the average subreddit. Apache Spark, an advanced cluster computing framework, is employed to speed up processing of the large corpora. Scalable machine learning techniques implemented in Spark help discriminate the content made in Aspergers versus other subreddits and automatically discover linguistic predictors of ASD within minutes, providing timely reports.

BibTeX:

@inproceedings{thin2017estimating,
  author = {Nguyen, Thin and Nguyen, Hung and Venkatesh, Svetha and Phung, Dinh},
  title = {Estimating Support Scores of Autism Communities in Large-Scale Web Information Systems},
  booktitle = {International Conference on Web Information Systems Engineering},
  year = {2017},
  pages = {347--355},
  url = {https://link.springer.com/chapter/10.1007/978-3-319-68783-4_24}
}

Tran T, Phung D, Bui H and Venkatesh S (2017), "Hierarchical semi-Markov conditional random fields for deep recursive sequential data", Artificial Intelligence. Vol. 246, pp. 53-85. Elsevier.

[BibTeX]

BibTeX:

@article{tran2017hierarchical,
  author = {Tran, Truyen and Phung, Dinh and Bui, Hung and Venkatesh, Svetha},
  title = {Hierarchical semi-Markov conditional random fields for deep recursive sequential data},
  journal = {Artificial Intelligence},
  publisher = {Elsevier},
  year = {2017},
  volume = {246},
  pages = {53--85}
}

Vellanki P, Duong T, Phung D and Venkatesh S (2017), "Data mining of intervention for children with autism spectrum disorder", In eHealth 360°. , pp. 376-383. Springer.

[BibTeX]

BibTeX:

@inproceedings{vellanki2017data,
  author = {Vellanki, Pratibha and Duong, Thi and Phung, Dinh and Venkatesh, Svetha},
  title = {Data mining of intervention for children with autism spectrum disorder},
  booktitle = {eHealth 360°},
  publisher = {Springer},
  year = {2017},
  pages = {376--383}
}

Vellanki P, Duong T, Gupta S, Venkatesh S and Phung D (2017), "Nonparametric discovery and analysis of learning patterns and autism subgroups from therapeutic data", Knowledge and information systems. Vol. 51(1), pp. 127-157. Springer.

[BibTeX]

BibTeX:

@article{vellanki2017nonparametric,
  author = {Vellanki, Pratibha and Duong, Thi and Gupta, Sunil and Venkatesh, Svetha and Phung, Dinh},
  title = {Nonparametric discovery and analysis of learning patterns and autism subgroups from therapeutic data},
  journal = {Knowledge and information systems},
  publisher = {Springer},
  year = {2017},
  volume = {51},
  number = {1},
  pages = {127--157}
}

Vellanki P, Rana S, Gupta S, Rubin D, Sutti A, Dorin T, Height M, Sanders P and Venkatesh S (2017), "Process-constrained batch Bayesian optimisation", In Advances in Neural Information Processing Systems. , pp. 3417-3426.

[Abstract] [BibTeX] [URL]

Abstract: Prevailing batch Bayesian optimisation methods allow all control variables to be freely altered at each iteration. Real-world experiments, however, often have physical limitations making it time-consuming to alter all settings for each recommendation in a batch. This gives rise to a unique problem in BO: in a recommended batch, a set of variables that are expensive to experimentally change need to be fixed, while the remaining control variables can be varied. We formulate this as a process-constrained batch Bayesian optimisation problem. We propose two algorithms, pc-BO(basic) and pc-BO(nested). pc-BO(basic) is simpler but lacks convergence guarantee. In contrast pc-BO(nested) is slightly more complex, but admits convergence analysis. We show that the regret of pc-BO(nested) is sublinear. We demonstrate the performance of both pc-BO(basic) and pc-BO(nested) by optimising benchmark test functions, tuning hyper-parameters of the SVM classifier, optimising the heat-treatment process for an Al-Sc alloy to achieve target hardness, and optimising the short polymer fibre production process.

BibTeX:

@inproceedings{vellanki2017process,
  author = {Vellanki, Pratibha and Rana, Santu and Gupta, Sunil and Rubin, David and Sutti, Alessandra and Dorin, Thomas and Height, Murray and Sanders, Paul and Venkatesh, Svetha},
  title = {Process-constrained batch Bayesian optimisation},
  booktitle = {Advances in Neural Information Processing Systems},
  year = {2017},
  pages = {3417--3426},
  url = {http://papers.nips.cc/paper/6933-process-constrained-batch-bayesian-optimisation}
}

Venkatesh S and Christensen H (2017), "Using life's digital detritus to feed discovery.", The lancet. Psychiatry. Vol. 4(3), pp. 181.

[BibTeX] [DOI] [URL]

BibTeX:

@article{venkatesh2017using,
  author = {Venkatesh, Svetha and Christensen, Helen},
  title = {Using life's digital detritus to feed discovery.},
  journal = {The lancet. Psychiatry},
  year = {2017},
  volume = {4},
  number = {3},
  pages = {181},
  url = {http://www.thelancet.com/journals/lanpsy/article/PIIS2215-0366(16)30351-0/fulltext},
  doi = {10.1016/S2215-0366(16)30351-0}
}

Vu H, Nguyen TD, Travers A, Venkatesh S and Phung D (2017), "Energy-Based Localized Anomaly Detection in Video Surveillance", In Pacific-Asia Conference on Knowledge Discovery and Data Mining. , pp. 641-653.

[BibTeX]

BibTeX:

@inproceedings{vu2017energy,
  author = {Vu, Hung and Nguyen, Tu Dinh and Travers, Anthony and Venkatesh, Svetha and Phung, Dinh},
  title = {Energy-Based Localized Anomaly Detection in Video Surveillance},
  booktitle = {Pacific-Asia Conference on Knowledge Discovery and Data Mining},
  year = {2017},
  pages = {641--653}
}