Either your web browser doesn't support Javascript or it is currently turned off. In the latter case, please turn on Javascript support in your web browser and reload this page. Data searching and retrieval is one of the fundamental functionalities in many Web of Things applications, which need to collect, process and analyze huge amounts of sensor stream data. The problem in fact has been well studied for data generated by sensors that are installed at fixed locations; however, challenges emerge along with the popularity of opportunistic sensing applications in which mobile sensors keep reporting observation and measurement data at variable intervals and changing geographical locations. To address these challenges, we develop the Geohash-Grid Tree, a spatial indexing technique specially designed for searching data integrated from heterogeneous sources in a mobile sensing environment. Results of the experiments on a real-world dataset collected from the SmartSantander smart city testbed show that the index structure allows efficient search based on spatial distance, range and time windows in a large time series database.

Author:Faehn Grorg
Language:English (Spanish)
Published (Last):28 November 2015
PDF File Size:4.9 Mb
ePub File Size:10.84 Mb
Price:Free* [*Free Regsitration Required]

No downtime is expected, but site performance may be temporarily impacted. Refworks Account Login. Open Collections. UBC Theses and Dissertations. Featured Collection. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission.

Our proposed technique is to first mitigate the dimensionality curse problem by approximating each trajectory with a low order polynomial-like curve, and then incorporate a multidimensional index into the re-duced space of polynomial coefficients. There are many possible ways to choose the polynomial, including Fourier transforms, splines, non-linear regressions, etc.

Some of these possibilities have indeed been studied before. We hypothesize that one of the best approaches is the polynomial that minimizes the maximum deviation from the true value, which is called the minimax polynomial.

Minimax approximation is particularly meaningful for indexing because in a branch-and-bound search i. In general, among all the polynomials of the same degree, the optimal minimax polynomial is very hard to compute. However, it has been shown that the Chebyshev approximation is almost identical to the optimal mini-max polynomial, and is easy to compute [32].

That is, we show that the Euclidean distance between two d-dimensional trajectories is lower bounded by the weighted Euclidean distance between the two vectors of Chebyshev coefficients. This lemma is not trivial to show, and it ensures that indexing with Chebyshev coefficients admits no false negatives. To complement the analytic re-sult, we conduct comprehensive experimental evaluation with real' and generated 1-dimensional to 4-dimensional data sets. Multiple Indices 76 6.

Raymond T. Ng, for his rewarding guidance and constant support throughout the course of my work. It was his supervision and inspiration that have not only enabled me to complete this thesis but also led me to the right direction of research.

Additionally, special thanks go to both Dr. I would also like to thank Dr. Jason Harrison and Dr. Michiel van de Panne for providing me with their motion capture data. I did most of my research in the Database Systems Laboratory, where there was an enlightening and stimulating academic environment. My appreciation is extended to all my colleagues for their friendship and help. I owe many thanks to Dr. Edwin Knorr for his valuable comments and precious suggestions on my thesis.

Finally, I am indebted to my parents for their love, education and encour-agement during the eighteen years of my schooling. For a second example, Vi may capture the 2-dimensional or 3-dimensional coordinates of a moving object at time U, in which case we have a spatiotemporal trajectory. For yet another example, a trajectory may represent the change of the attributes or features of an entity over time. For example, doctors monitor the health conditions of their patients by keeping track of their body temperatures, geologists record monthly or annual rainfall data for weather forecasting, and financial analysts try to find patterns in their large pools of stock prices of different companies.

There are also many large collections of higher-dimensional spatiotemporal trajectories, thanks in part to the development of cost-effective mobile technolo-gies, such as Geographic Information Systems, wireless communication electronics, and multimedia applications [14, 43, 56]. Examples include spatiotemporal trajec-tories of cars, airplanes, and other moving objects generated by motion tracking devices in surveillance applications and electronic games applications.

Additionally, a video stream can also be regarded as a multidimensional trajectory, as it consists of a sequence of multiple frames, each of which is characterized by a set of feature attributes. Specifically, as part of our collaboration with an electronic games company, we encounter large collections of 2-, 3- and 4-dimensional spatiotemporal trajecto-ries. A 3-dimensional example is the positions of aircrafts during a flight simulation.

Finally, a 4-dimensional example is the four angles of body joints of a person playing kung-fu or dancing. This type of data sets is useful for games developers and medical professionals.

The point here is that beyond 1-dimensional 2 time series, applications of higher-dimensional spatiotemporal trajectories are very common. Given those enormous databases of trajectories, what can we do with them, and how do we retrieve valuable information from them? One of the fundamental operations in mining trajectories is similarity matching, which refers to the pro-cess of finding trajectories that are similar to a given query trajectory.

Similarity matching is useful in two aspects. First, it is a subroutine of many data mining tasks, such as classification, clustering, rule discovery, outlier detection, and query by contents. Second, it is important in its own right for exploratory data analysis. In general, they can be classified into two categories: metric functions and non-metric functions. In most cases, a metric function is desired, because the triangle inequality can then be used to prune the index during search.

Definition 1. A simple variant of 1. As a result, many attempts have been made to come up with distance functions that are invariant with respect to six transformations: shifting, uniform amplitude scaling, uniform time scaling, uniform bi-scaling, time warping and non-uniform amplitude scaling.

Unfortunately, none of them is a metric. In effect, it. One of the advantages is that it is robust to noise by giving more weight to the similar portions and paying less attention to regions of great dissimilarity. For example, first-order landmarks are global or local extrema, second-order landmarks are inflection points, and so on. It is claimed to better match human intuition and episodic memory as it takes smoothing into account by letting certain landmarks be overshadowed by others.

In this thesis, we adopt the Euclidean distance function Disteuc for spa-tiotemporal trajectories. While this distance function is easy to compute, it is natu-ral for many applications of spatiotemporal trajectories, including those for airplanes and other flying objects. Additionally, it allows scalable solutions to other problems such as clustering and classification. It is also the distance function adopted by most studies on indexing time series, including [21]. For more advanced distance functions such as time-warping [4] and longest common subsequence [53], we consider them future topics of investigation.

While we shall discuss related works in greater detail in Chap-ter 2, it suffices to say that most existing frameworks are based on piecewise approx-imations, where each piece is either constant or linear.

However, recall that, among the examples cited in Section 1. This is because all those activities e. That is to say, a smooth and continuous trajectory is approximated with a piecewise discontinuous function. This mismatch may cause an unnecessary error or deviation, and may lead to a loss in pruning power in a branch-and-bound search. In this thesis, we seek to approximate and index a d-dimensional spatiotem-poral trajectory with a low order continuous polynomial-like curve.

There are many possible ways to choose the polynomial, including continuous Fourier transforms, splines, non-linear regression, and so on. While all approximations are not exact by definition, the approximation that minimizes the maximum deviation from the true value is very desirable.

This is called the minimax approximation. Minimax ap-proximation is particularly meaningful for indexing because in a branch-and-bound search i. However, in general, among all the polynomials of the same degree, the optimal minimax polynomial is very hard to compute.

It has been shown that the Chebyshev approximation is almost identical 7 to the optimal minimax polynomial, and is easy to compute [32]. Thus, we shall explore how to use the Chebyshev polynomials as a basis for indexing d-dimensional trajectories. If a searching mechanism retrieves a proper subset S of R, then the wrongly dismissed trajectories in R — S are called false dismissals or false negatives.

On the other hand, if S is a proper superset of R, then the wrongly retrieved trajectories in S — R are called false alarms or false positives. As we can always remove false positives in a post-processing stage, they can be tolerated as long as there are not too many of them.

Searching techniques that guarantee no false negatives are said to be exact; however, there are studies which consider providing faster approximate search at the expense of allowing both false positives and negatives [46, 25]. The most obvious brute-force solution for similarity matching would be a sequential scan of the whole database, in which we compute the distance between every trajectory x G DB and q, and return x if it qualifies. This approach requires that we access every single page in the database, which is clearly unrealistic for large data sets.

Any mechanisms that avoid retrieving all the data pages could potentially increase the speed of the search, which automatically entails the idea of using an index. Thus, it is discrete in nature.

We show how to approximate such a discrete "function" with Chebyshev polynomials. We first begin with the 1-dimensional case of time series. Our representation scheme allows us to prove a main result of this thesis - the Lower Bounding Lemma. That is, the true distance between two time series is lower-bounded by the distance in the index space i. As shown in Chapter 4, this is not a trivial result to prove. Specifically, a d-dimensional trajectory is projected onto each dimension to create d 1-dimensional trajectories.

We show that this projection preserves the Lower Bounding Lemma. We also give algorithms for building an index 9 of Chebyshev coefficients, and for supporting similarity searching of whole trajectories.

We use 1- to 4- dimensional real data sets, as well as generated i. We also extend A P C A to d-dimensional situations as a "straw man" strategy. Our empirical results indicate that Chebyshev approximation can deliver a 3- to 5- fold reduction on the dimensionality of the index space. For instance, it only takes 4 to 6 Chebyshev coefficients to deliver the same pruning power produced by 20 A P C A coefficients.

This is a very important advantage. As the dimensionality curse on the indexing structure is bound to set in sooner or later, Chebyshev coefficients are far more effective than A P C A in delivering additional pruning power before that happens. In Chapter 2, we discuss related works. In Chapter 3, we review Chebyshev polynomials and their properties central to the 10 development of this thesis. In Chapter 4, we show how to approximate a time series with a Chebyshev polynomial, and give an example.


Piecewise Chebyshev Factorization based Nearest Neighbour Classification for Time Series

Agrawal , C. Efficient similarity search in sequence databases. In Proc. Note: F-index. Fast time sequence indexing for arbitrary lp norms.


Spatial Indexing for Data Searching in Mobile Sensing Environments.

DOI : Agrawal, C. Faloutsos, and A. Swami , Efficient similarity search in sequence databases , Proc. Assent, R.

Related Articles