Michael Peter Christen
276a66a793
Adding a limit of 1000 links that a parser shall store during indexing.
...
A limit was necessary because some web pages have such huge numbers of
links that it can easily cause a OOM just by the number of links.
The quesion if the number of 1000 links is sufficient or too weak must
be answered with the result of testing this feature.
13 years ago
Michael Peter Christen
de903a53a0
parser refactoring & hacks
13 years ago
Michael Peter Christen
1825f165b8
better integration of blacklist according to use case
13 years ago
Michael Peter Christen
ce8d4b87d9
fixes for new eclipse 'Juno' warning 'Resource leak'.
13 years ago
Michael Peter Christen
0c345d1559
giving threads name so its easier to see whats happening during
...
debugging and within a thread dump
13 years ago
Michael Peter Christen
508a81b86c
added solr field 'refresh_s' which stores the refresh url contained in
...
the meta-refresh html header field.
13 years ago
Michael Peter Christen
f3167def64
do not fill the keywords with title content if keywords do not exist.
13 years ago
Michael Peter Christen
77f795756c
fixing redirects and status codes: storing of status code in
...
ResponseHeader to make it available for late evaluations, like storage
in solr.
13 years ago
Michael Peter Christen
dbdd697f4d
moved RDFaParser.xsl configuration file to defaults
13 years ago
Michael Peter Christen
786be7d175
better integration of RDFaParser
13 years ago
Michael Peter Christen
de3ef8ad73
removed unimportant warnings
13 years ago
Michael Peter Christen
24bbe359ca
integrate also geonames library files for less cities. these are more
...
useful for tagging since less normal words are false-identified as
location
13 years ago
Michael Peter Christen
223a5440ab
preventing that an empty pnd is inserted into the vocabularies
13 years ago
Michael Peter Christen
963f92ed9a
- merged files
...
- changed behaviour of delete button in vocabulary edit
- fixed size numbe in vocabulary listing
13 years ago
Michael Peter Christen
dd88d0ace2
more logging
13 years ago
Michael Peter Christen
94d54e2d91
added recognition of multi-word terms in vocabulary matching
...
this makes the PND usable: it is now possible to recognize persons and
navigate with a 'Persons' facet.
13 years ago
Michael Peter Christen
64c0268b2b
show triplestore metadata in yacydoc and viewfile
13 years ago
Michael Peter Christen
c2f0d16d2c
fixed vocabulary initialization
13 years ago
Michael Peter Christen
df3531f8d5
added the generation of virtual vocabularies using the pnd
13 years ago
Michael Peter Christen
a0f1decd82
- added loading of the dbpedia pnd triplestore in the dictionary loader
...
- renamed the dictionary loader to knowledge loader
- some refactoring in the library provider method names
13 years ago
Michael Peter Christen
16d8f33795
added objectlink generation to vocabulary generation and editor
13 years ago
Michael Peter Christen
d45718251e
refactoring (Localization -> Location)
13 years ago
Michael Peter Christen
b8b3c87ba7
- renamed localization to location (that was confusing)
...
- renamed 'Locale' navigator to 'Location'
- produce Location navigation only if geolocation libraries are loaded
13 years ago
Michael Peter Christen
e89747bb67
- added automated generation of vocabularies from url stubs
...
- added clear of all terms for vocabularies
- added deletion of vocabularies
13 years ago
Michael Peter Christen
79464189a4
The 'Locale' vocabulary, which is generated by geo data, has now the
...
objectspace "http://dbpedia.org/resource/ "
13 years ago
Michael Peter Christen
61bb52d55c
- using http://purl.org/dc/terms/references to refer from an
...
auto-annotated document to a 'pseudo-linked' document which has an url
created with an object-prefix as defined in the vocabulary file
13 years ago
Michael Peter Christen
50c576599b
allow multiple parser options instead of printing an error
13 years ago
Michael Peter Christen
8b53771db2
changed behavior of navigation processing:
...
- vocabulary annotation is not done any more into the metadata of urldb
- vocabularies are written into the jena triplestore using a rdf
vocabulary
- vocabularies for rdf tripel must be updated; refactoring done
- with the new navigation tags in the triplestore a faster
pre-urldb-lookup is possible: navigation is processed now within the RWI
during pre-ranking retrieval
- added also a Owl vocabulary stub to add the plain-text url to the
triplestore using the owl:sameas predicate
13 years ago
Michael Peter Christen
5fc6524ca8
- moved triple store to net.yacy.cora.lod (should be generalized there
...
later
- added abstract add, delete, get methods in the triplestore
- added generation of triples after auto-annotation
- migrated all MultiProtocolURI objects to DigestURI in the parser since
the url hash is needed as subject value in the triples in the triple
store
13 years ago
cominch
bbfc53b663
bugfix
13 years ago
cominch
65c5826d93
bugfix
...
Conflicts:
source/net/yacy/document/parser/augment/AugmentParser.java
13 years ago
cominch
5f8ba7f4f2
small changes
...
Conflicts:
source/net/yacy/document/parser/augment/AugmentParser.java
source/net/yacy/interaction/Interaction.java
13 years ago
cominch
90512640bf
Added config switches for custom parser
...
Conflicts:
source/net/yacy/document/TextParser.java
13 years ago
cominch
bcbd8eee33
Add several parsers, for RDFa and rdf files.
...
Conflicts:
source/net/yacy/document/TextParser.java
13 years ago
cominch
9cbfc1a1c0
augmentedProxy, which forwards every proxy request to a
...
rewrite engine to customize existing webpages. originally implemented by
Florian Richter.
Conflicts:
source/de/anomic/http/server/HTTPDProxyHandler.java
13 years ago
Michael Peter Christen
cde20911bb
saved a bit more ram using UTF8 String compression for OpenGeoDB and
...
Geonames data files.
13 years ago
Michael Peter Christen
225ee42879
made the GeoLocation into an interface with the current
...
integer implementation as accuracy implementation of 1.863cm
13 years ago
Michael Peter Christen
96e9d77270
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java
13 years ago
Michael Peter Christen
96c8119b50
added GeoLocation / GeoPoint classes which uses less memory than
...
Location/Coordinates and has initializers with correct order of lat,lon
coordinates
13 years ago
Michael Peter Christen
461a0ce052
removed warnings
13 years ago
Michael Peter Christen
2fe207f813
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
514700291a
moved Vocabulary to cora package (added in git
...
964406ad17
)
13 years ago
Michael Peter Christen
0284a4d88f
more fixes for double precision of coordinates
13 years ago
Michael Peter Christen
964406ad17
added concurrency enhancement to xml parser
13 years ago
Michael Peter Christen
e0d8643226
- performance hacks
...
- added log warnings in case that search processes run into time-out
situations
- better concurrency for Integer formatter (used a non-synchronized
formatter before)
- bugfix for search termination (a poison pill was missing)
- added timeout parameters for search (again) -> target is, that they
are never reached.
13 years ago
Michael Peter Christen
6e83b02b83
- bugfix for surrogate file reader
...
- bugfix for location search: suppress empty search
13 years ago
Michael Peter Christen
9b4c699526
ehanced location search:
...
- search request are now made using a map boundary
- search results are only computed for the map boundary
- the number of results is adopted to the results in the visible range
- added a double-buffering for the search result markers
- added a search query option for the search results:
/radius/<lat>/<lon>/<radius>
13 years ago
Michael Peter Christen
4d3cc02168
replaced old bzip2 library against better documented commons-compress
...
package from http://commons.apache.org/compress/
13 years ago
Michael Peter Christen
c15fcde1c8
add-on to latest commit
13 years ago
Michael Peter Christen
81737dcb18
removed stack trace from swf parser since we cant do anything there
13 years ago