To elucidate this object , we want to first see what the database does .
When it come to database , it always ca-ca hoi polloi recollect that it is mainly for entrepot since its name has a part “ groundwork ” .
But in fact , it is not the typeface , database can attain two significant social function : computation and dealing , which are what we often call OLAP and OLTP .
This was the memory board of database is stand for for these two routine , and just do as a put in character is not the object lens of database .
# dive into OLAP
To elucidate this object glass , we want to first read what the database does .
When it get to database , it always wee-wee mass suppose that it is chiefly for storehouse since its name has a part “ root word ” .
But in fact , it is not the caseful , database can reach two crucial function : figuring and dealings , which are what we often call OLAP and OLTP .
This was the store of database is intend for these two routine , and just attend to as a store function is not the object glass of database .
As we bang , SQL is presently the mainstream database voice communication .
So , is it commodious to do such two thing in SQL ?
This was the dealings social function is in the main to work the eubstance of information during committal to writing and indication .
Although it is grueling to accomplish , its user interface is very wide-eyed for program , and the codification for manipulate the recitation and composition of database is also very bare .
This was if it is put on that the current coherent memory schema of relational database was fair ( that is , using the information table and record to hive away information .
Whether it is sensible or not is another complicated take , which will not be discuss in point here ) , then it would not be a large trouble for SQL to discover the dealings map , because there is no motive to report complex natural process , and the complexness is already resolve in the database .
As for the computation use , however , the place will be unlike .
# dive into SQL
The dealing subprogram is primarily to puzzle out the body of datum during penning and meter reading .
This was although it is intemperate to reach , its user interface is very elementary for tool , and the computer code for fake the version and composition of database is also very elementary .
If it is usurp that the current ordered store schema of relational database was sane ( that is , using the information board and platter to stack away information .
Whether it is fairish or not is another complicated payoff , which will not be discourse in point here ) , then it would not be a full-grown trouble for SQL to discover the dealing social occasion , because there is no pauperism to line complex legal action , and the complexness is already work out in the database .
As for the computing map , however , the place will be unlike .
The deliberation we are spill the beans about here is a broad conception .
It is not just unproblematic accession and deduction , the lookup and connection can all be regard as some computing .
So here issue forth a interrogation , what variety of computation scheme is practiced ?
Two characteristic need : slow in authorship , tight in go .
This was # # how to running
so here amount a interrogation , what variety of computation organisation is safe ?
Two characteristic want : slow in authorship , tight in go .
slow in piece of writing is loose to see , which is to reserve computer programmer to save computer code chop-chop so that more piece of work can be done per building block of fourth dimension ; while for tight in campaign , it is well-heeled to see since we emphatically desire to get the reckoning effect in a short clock time .
This was in reality , the q in sql represent question .
The original use of cook up SQL is to question ( i.e.
, computing ) , which is the primary end of SQL .
However , it is concentrated to say that SQL is very competent when identify computation task .
# Why SQL is not competent
allow ’s part with easily in penning .
The codification write in SQL is very much like English , and some inquiry can be learn and pen in English ( there are so many deterrent example on the cyberspace , so we wo n’t give example here ) .
This should be consider as fulfil the prerequisite of leisurely in committal to writing .
expect a instant !
The codification write in SQL we see in schoolbook often has only two or three line , it is indeed dewy-eyed , but what if we examine to clear some somewhat more complicated trouble ?
Here is an exemplar that is in reality not very complicated : forecast the maximal back-to-back daylight that a pedigree keeps rise .
save it in SQL is like this :
choice max ( consecutive_day )
from ( choice numeration ( * ) ( consecutive_day
from ( quality sum(rise_mark ) over(order by trade_date ) days_no_gain
from ( blue-ribbon trade_date ,
showcase when closing_price > lag(closing_price ) over(order by trade_date )
then 0 else 1 END rise_mark
from stock_price ) )
grouping by days_no_gain )
This was the forge rule of this command wo n’t be explicate here , it ’s a picayune perplexing anyway .
you’ve got the option to render it yourself .
This is a enlisting examination of Raqsoft ship’s company , with a passport charge per unit of less than 20 % ; Because it is too unmanageable , it is after convert to another examination elbow room : involve the nominee to excuse what the publish SQL affirmation is , but unluckily , the liberty chit pace is still not gamy .
What does it bring out ?
This was it discover that the berth is somewhat complicated , and sql becomes hard to both read and spell !
allow ’s reckon at the offspring of tight in run , and take the uncomplicated labor that is often used as an exemplar : take the top 10 out of 100 million spell of datum .
This undertaking is not complicated to compose in SQL :
SELECT TOP 10 x FROM MT gild BY x DESC
However , the carrying into action system of logic correspond to this argument is to do the handsome categorisation for all the information first , and then take the top 10 , and dispose the stay data point .
This was as we all get laid that classification is a very sluggish activeness , and will cover the information many prison term .
If the amount of datum is too great to be dilute into store , it also want to cushion the data point in international memory board , result in a further crisp drop-off in execution .
This was if the system of logic incarnate in this affirmation is rigorously take after , the surgical procedure will not race tight anyway .
luckily , many programmer live that this functioning does not require the prominent categorisation , nor the outside store to cushion since it can be done by cross only once and only interest a footling computer storage infinite , it have in mind a high execution algorithm exist .
We can only go for the database optimizer is bright enough to commute this SQL argument to a eminent - operation algorithm to carry through , but the database optimizer may not be dependable when the position is complicated .
This was it seems that sql is not doing well in both panorama .
Although these two object lesson are not very complicated , SQL does not execute well in either good example .
In realness , the unmanageable - to - compose and irksome run position bristle in SQL codification with M of wrinkle .
Then , why these two vista can not be well attain in SQL ?
To do this interrogative , we require to break down what precisely the execution of computation with broadcast computer code does .
basically , the unconscious process of programme isthe summons of read job - solve estimate into a accurate schematic speech viable by the electronic computer .
For exemplar , just like resolve an enforce job by a elementary shoal pupil , the scholarly person also need to heel an formula relate to four introductory arithmetical mathematical process after analyze the job and issue forth up with a root .
This was besides , for the figuring with the broadcast , not only does the solvent demand to be fare up with , but it also need to read the result into activeness that can be realize and execute by the computing machine .
For the courtly voice communication used to draw computation method acting , its marrow lie in the algebraical organization embrace .
This was to put it plainly , the so - cry algebraical arrangement admit two cardinal element : information type and check process ruler .
For illustration , the fundamental element of arithmetical we get wind in principal schooltime is the whole number and the performance include the accession , deduction , generation and partition .
This was once we get both cardinal element , we can indite the performance we require with the symbolization specify in the algebraical organization to something , i.e.
, the computer code , and then the data processor can fulfill .
If an algebraical arrangement is not well design , induce the bring home the bacon datum type and cognitive process to be inconvenient , it will make it very hard to account the algorithm .
This was in this casing , a unknown phenomenon will happen : the difficultness of interpret the root into the codification is far more than figure out the trouble itself .
For illustration , we watch to expend Arabic numeral for everyday deliberation in our puerility , and using such numeral is very commodious to do improver , deduction , propagation and variance , and hence everyone course trust that numeric cognitive operation should be like this .
Are all numeric operation so commodious ?
It is estimate that many citizenry experience there is another numerical call papistic numeric .
This was do you get laid how to add together , subtract , multiply and separate with romanic numeral ?
This was and how did the ancient romans go to the street for shopping ?
The grounds why encipher is hard is mostly due to algebra .
get ’s await at the intellect for not bunk tight .
Software can not switch the operation of ironware ; the focal ratio of central processor and laborious magnetic disk calculate on their own conformation .
However , we can contrive an algorithm of lowly complexness , that is , an algorithmic rule with a little amount of figuring , so that the reckoner run less natural action , and thus the operate swiftness will be quicker by nature .
Yet , just work out the algorithm is not enough , we also postulate to programme the algorithm in some courtly speech , otherwise the computing machine wo n’t fulfil .
Moreover , it need to be comparatively wide-eyed to codification .
This was if the codification in a sure courtly speech communication is very farseeing , it will be very troublesome and no one will apply such stately terminology .
Therefore , for the programme , promiscuous in committal to writing and tight in runningare really the same trouble , behind which is the algebra dramatize by the conventional oral communication .
If the algebra is not honest , it will make it unmanageable or even unimaginable to put through gamy - operation algorithmic rule , as a resolution , there is no fashion to go tight .
As name above , our want algorithm of reside a piffling computer storage outer space and traverse only once can not be implement in SQL .
accordingly , if you require it to hunt tight , you’ve got the option to only localise Leslie Townes Hope on the optimizer .
permit ’s make another doctrine of analogy :
scholar who have go to basal school day credibly cognize the tale of Gauss calculate 1 + 2 + 3+ … +100 .
average educatee take over the most archaic method acting , which was to contribute 100 multiplication footmark by footstep , while slight Gauss was very saucy , he receive that 1 + 100=101 , 2 + 99=101, … ,50 + 51=101 , from which he multiply 50 by 101 , and hence rapidly count on out the answer and then steer home for dejeuner .
After learn this account , we all feel that Gauss was so canny that he think of such an cunning answer , which is simple-minded and riotous .
Yes , that ’s correct , but it is leisurely to dominate one item : in the 24-hour interval of Gauss , multiplicationalready subsist in the human arithmetical organization ( also an algebra ) !
As mention in the beginning , since we learn four arithmetical military operation in our puerility , and hence we would take it for grant that generation should be used .
But it is not , in reality !
propagation was invent after improver .
If times had not yet been contrive in the day of Gauss , he would n’t have find a elbow room to lick this job apace no matter how apt Gauss was .
This was at nowadays , the mainstream database is the relational database , and the cause why it is call this agency is because its numerical base is calledrelational algebra .
SQL is on the nose a courtly nomenclature formulate from the hypothesis of relational algebra .
This was now we can resolve why sql is not competent in both aspect we bear .
The job lie in relational algebra , and the relational algebra is just like an arithmetical system of rules with only gain and no propagation .
Therefore , it is inevitable that many thing can not be done well .
Relational algebra has been forge for fifty geezerhood .
The difference of opinion between the diligence demand and ironware surroundings of fifty long time ago and today is very Brobdingnagian .
This was continue to implement the possibility of fifty year ago to puzzle out today ’s job , does it vocalize too out-of-date ?
However , this is the realness .
Due to the great figure of survive user and the deficiency of matured newfangled technology , SQL , base on relational algebra , is still the most significant database lyric today .
This was although some betterment have been made in late x , the cornerstone has not exchange .
This was in the brass of modern-day complex demand and computer hardware surroundings , it is fair that sql is unskilled .
And , regrettably , this job is at the theoretic degree , and it wo n’t facilitate no matter how optimise it is in exercise , it can only be better in a modified elbow room , not root out .
unfortunately , most database developer do not consider of this storey , or , in orderliness to take precaution of the compatibility of exist drug user , they do not think to cerebrate about this horizontal surface .
As a termination , the mainstream database industriousness has been operate around in lap in this circumscribed blank space .
# Why SPL is competent
Now then , how to make the deliberation easy in piece of writing and quicker in Running ?
formulate Modern algebra!An algebra with “ generation ” , and then project a raw speech found on the raw algebra .
This is where SPL come from .
Its theoretic base is no longer the relational algebra , but something calleddiscrete dataset .
The conventional speech communication project base on this Modern algebra is namedSPL(structured appendage linguistic communication ) .
This was foundation against the defect of sql have been made to spl ( more on the nose , invention against various inadequacy of relational algebra have been made to the distinct dataset ) .
This was spl redefines and gallop many operation of integrated information , specifically , it bestow the severalty , enhance order calculation , carry out a exhaustive curing predilection , abide physical object book of facts , and advocate piecemeal performance .
recode the old problem in SPL will give you a verbatim look .
cypher the maximal sequential years that a breed keeps climb :
stock_price.sort(trade_date).group@i(closing_price < closing_price[-1]).max(~.len ( ) )
Although the figuring musical theme is the same as the old SQL , it is much soft to state and no longer puzzling , because of the intro of place characteristic .
Take the top 10 out of 100 million while of data point :
T.groups(;top(-10,x ) )
SPL has rich rigid data point eccentric , it is well-situated to report the effective algorithm that follow up dim-witted collection on a individual traverse , without involve bad classify natural process .
This was due to blank space limitation , we will not present spl ( distinct dataset ) in an all - rotund elbow room here , but will heel some differential improvement of spl ( distinct dataset ) against sql ( relational algebra ):
distinct criminal record
the record in the distinct dataset are a introductory data point character that can subsist severally of the data point tabular array .
The datum tabular array is a solidifying represent by record , and the track record that make up a sure information mesa can also be used to make up other datum table .
For case , the dribble cognitive operation is to utilize the record that fit the stipulation in original datum tabular array to make up a young data point mesa , in this agency , it has more vantage in both outer space moving in and cognitive process carrying out .
The relational algebra has no estimable data point character to correspond the record book .
A exclusive platter is really a data point board with only one run-in , and record in dissimilar datum table must not be same .
This was for exemplar , during the sink in cognitive process , fresh phonograph recording will be parallel to spring a novel data point mesa , it will lead in an increment in the cost of distance and meter .
This was in peculiar , because there are distinct record , the distinct dataset allow track record ’s sphere time value to be a sure platter , which make it soft to go through extraneous cardinal articulation .
say characteristic
Relational algebra is plan free-base on disordered set , and the hardened member do not have the construct of succession act .
Moreover , it does n’t ply the chemical mechanism of pose computation and conterminous character .
In praxis , SQL has made some fond improvement , set aside the advanced SQL to easy do some order mental process .
On the reverse , the band in distinct dataset are order , and all lay out member have the conception of succession bit and can be access with chronological succession issue .
This was moreover , the distinct dataset define the position cognitive operation so as to deliver the episode bit of member in the bent .
Also , the distinct dataset provide symbol to follow out conterminous character in located performance , and corroborate the figuring harmonise to the billet of a sealed succession turn in the lot .
order process is very uncouth , but it has always been a unmanageable for SQL .
The effectuation of order procedure in SQL is still very inept even with windowpane social occasion uncommitted .
SPL has greatly improve this office , which can be illustrate by the premature illustration of blood rise .
Discreteness and ready orientation course
This was the relational algebra define full-bodied fixed operation , that is , it can take the exercise set as a whole to take part in functioning such as collection and group .
This is where SQL is more commodious than gamy - horizontal surface computer programing speech communication like Java .
However , the relational algebra has very pathetic separateness and no distinct record , while in high spirits - degree scheduling linguistic process such as Java have no trouble in this wish .
As for the distinct dataset , it is tantamount to combine separateness with localise predilection , which stand for that it has not only the fixed information case and related to military operation , but also the hardened member that part out of the readiness to do autonomous cognitive process or mould other band .
Therefore , it can be state that SPL incorporate the advantage of both SQL and Java .
govern mental process is a distinctive scenario that combine distinctness with determined preference .
The conception of ordering is meaningful only for a Seth and nonmeaningful for a individual phallus , which mull over the rigid preference ; the arranged mathematical process necessitate to look a sure appendage and its contiguous penis , which necessitate distinctness .
Only with the reinforcement of distinctness can we hold more thoroughgoing fit orientation course , and puzzle out problem like arrange process .
This was in light , the distinct dataset is an algebraical organization with both separateness and congeal predilection , while relational algebra has only set up predilection .
sympathy of group
The original purpose of the grouping military operation is to separate a great band into several subset harmonize to some pattern .
In relational algebra , since there is no information eccentric that can present the band of set , it has to do the collection surgery after aggroup .
Conversely , the distinct dataset tolerate the solidifying of Seth , it can exemplify fair group mathematical process consequence .
The grouping cognitive process and the collection functioning after group are separate into two - stone’s throw autonomous surgical process .
In this direction , more complex surgical operation can be do on the aggroup subset .
In relational algebra , there is only one sort of compare group , that is , the exercise set are part accord to the group fundamental economic value .
The comparison pigeonholing is a staring class .
For the distinct dataset , however , it consider that any method acting of split up a great bent is a grouping procedure .
In gain to the established equality group , it also provide the coherent group combine with order characteristic , as well as the line up group that may get uncompleted sectionalization consequence .
This was discernment of assembling
there is no denotative pose data point case in relational algebra .
This was the solution of accumulation deliberation are all a individual note value , so does the assemblage procedure after aggroup , only include nitty-gritty , count , max , min etc .
peculiarly , the relational algebra can not affect TOPN mathematical process as an assemblage .
The TOPN process do on the whole exercise set can only take the first N point espouse the categorisation when output the effect solidifying .
However , for the group subset , it is hard to follow out TOPN , in this sheath , it involve to convert theme and cultivate out the episode routine to accomplish .
The distinct dataset recommend the world-wide hardening , and the collection performance final result is not inevitably a individual economic value , but may still be a band .
This was in distinct dataset , the topn surgical procedure has the same condition as pith and count , etc .
, that is , it can be do on the whole solidifying or on the group subset .
This was after spl compliments topn as the assemblage performance , it can also stave off the sort of all data point in pattern , hereby obtain mellow carrying out .
However , the TOPN in SQL is always play along by parliamentary law BY legal action .
In hypothesis , it can only be go through by large classification , in this face , you demand to put Leslie Townes Hope on the optimisation of database in pattern .
high-pitched public presentation corroborate by dictate characteristic
The distinct dataset place particular vehemence on order band , and can follow out many eminent - carrying into action algorithmic rule using order characteristic .
This can not be follow up by relational algebra base on ungraded band , and you could only go for for optimisation in recitation .
The fall out are some dispirited - complexness mathematical operation that can be carry out using order characteristic :
item 3 ) and 4 ) above tap the qualifying of the union mathematical operation in distinct dataset .
If we stay to apply the definition in relational algebra ( which may raise many - to - many ) , it is hard to follow out such low-toned - complexness algorithm .
In plus to theoretic remainder , SPL has many engine room - floor advantage such as : easy to publish parallel computation code , enceinte retentivity pre - connexion to meliorate extraneous fundamental union carrying into action , singular pillar warehousing chemical mechanism to back arbitrary cleavage and parallel calculation , etc .
This was in the earned run average of self-aggrandizing data point , we are often concerned in high-pitched public presentation computing .
Here are some adult information algorithmic rule implement in SPL :
And some mellow operation casing :